SRE / DevOps / Kubernetes Weekly Collection#56(Week 8, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #529 February 14th, 2021
SRE Weekly Issue #257 February 14th, 2021
KubeWeekly #251 February 19th, 2021

DEVOPS WEEKLY ISSUE #530 February 21st, 2021


What to expect from devops in 2021? This post pin-points some evergreen topics like security, a focus on the business, automation, but also the growth of low code and pipeline analytics and monitoring.

  • The title is “What to Expect from DevOps This Year: The Experts Weigh In”.
  • As a special edition of the Software Delivery Leadership Delivery Forum, they invited industry analysts to explain along the titles and points above.

An interesting post from someone involved with accessing RubyGems after the recent dependency confusion attack interest.

  • The title is “RubyGems dependency confusion attack side of things”.
  • Following the article “Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies”, this article is written to reassure users by presenting a story on the RubyGems side. To avoid misunderstanding, the intention is clearly stated in the “Note” at the beginning.
    ○ Note: This article is not to deprecate any of the findings and achievements of Alex Birsan. He did great work exploiting specific vulnerabilities and patterns. It is to present the RubyGems side of the story and to reassure you. We actively work to provide a healthy and safe ecosystem for our users.

Python is both a great programming language to get started with, and increasingly a useful extra tool to have in your toolbox for data and analytics work. This university course is a good starting place.

  • The title is “Learn to Code in Python, with Hany Farid”.
  • As commented by the Editor above, you can take a free YouTube video lecture at the university’s Python introductory course, and you can also see the attached code.
    ○ Part 1: Introduction to Programming and Computation
    ○ Part 2: Introduction to Data Structures and Analytics

Another set of public course materials, this time on Systems Administration fundamentals. Storage, file systems, networking, common protocols, system security, configuration management and ethics for systems administrators among the topics.

  • The title is “CS615 — System Administration”. A course with the above Editor’s comments and title. There are [slides] [video lecture] [transcript], and learning can proceed systematically.

A nice post on why lots of people enjoy working in IT. The people, the constant learning, the problem solving and more.

  • The title is “What Do You Love Most About Working in IT?”.
  • To commemorate Valentine’s Day, we ask people who work in IT and IT Service Management (ITSM) roles what they like most about working in IT.
    ○ The People
    ○ The Constant Chang
    ○ The Learning
    ○ The Opportunity to Solve Problems
    ○ The Varied Challenges
    ○ Making a Difference/Helping Others
    ○ Improving Things at an Industry Level
    ○ Working with the Technology
    ○ It Pays Well!

The recent changes around the CentOS project, with the new CentOS Streams approach, are triggering some interesting conversations. The new Hyperscale special interest group is just forming to discuss large scale deployment challenges and solutions.

  • The title is “Hyperscale SIG”.
  • As commented by the Editor above, the CentOS Streams Hyperscale SIG (special interest group) web page for the CentOS project.


Rclone (rsync for cloud storage) is a command line program to sync files and directories to and from different cloud storage providers like Google Drive, S3, Dropbox, Backblaze B2, One Drive, Google Cloud Storage and more.

A handy web service that makes it easy to check which cloud provider a website is using for it’s public services.

  • As mentioned above, the web page of “”, a convenient online tool that allows you to easily check the cloud provider that the target website uses as a public service.

The above service is based on a set of handy open source libraries that also have accompanying CLI tools that do the same.

  • A GitHub page of AWS/ Azure/GCP, a CLI tool for each cloud vendor in the open source library that comes with “” above.

SRE Weekly Issue #258 February 21st, 2021


Practiced Humility in Retrospectives

When acting as a retrospective facilitator, there’s a huge potential to color the discussion with our words and actions.

You’re there to position other folks to learn, not wear the badge.

Will Gallego

  • They discussed the following throws that lead to the title.
    ○ Why wouldn’t we be able to simply apply the calculus to our knowledge and change things for the better?
    ○ This all speaks to a distinct lack of humility in what we do as a practice.
  • It is explained according to the following items, and “Humility In Practice” touches on the attitude and behavior to practice.
    ○ Hubris as Facilitator
    ○ Top Down Misunderstanding of Retrospectives
    ○ Humility In Practice

GitHub Repo: upgundecha/howtheysre

upgundecha/howtheysre: A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

A huge thanks to the curator for the many awesome links in this repo! Some have been featured here in previous issues, and some are new to me. As I go through those, I’ll share my favorites here and tell you why I think you should read them.

Unmesh gundecha

  • This is very good. A GitHub page with links to resources on “SRE” practices published by tech-savvy companies around the world.
  • It includes LinkedIn’s “School of SRE” which I introduced the other day, and Mercari’s resources for Japanese companies.

Engineering dependability and fault tolerance in a distributed system

In this article, we discuss the concepts of dependability and fault tolerance in detail and explain how the Ably platform is designed with fault tolerant approaches to uphold its dependability guarantees.

Paddy Byers — Ably

  • It describes the concept of reliability and fault tolerance and how Ably’s platform is designed using a fault tolerant approach to maintain reliability assurance, along with the following:
    ○ Architectural approaches to achieve reliability
    ○ Stateful role placement
    ○ Detect, hash, resume
    ○ Channel persistence layer
    ○ Implementation considerations
    ○ Consensus formation in globally-distributed systems
    ○ Health is not binary
    ○ Resource availability issues
    ○ Resource scalability issues
    ○ Conclusion

Phishing complaints cause Notion outage

More details on the Notion outage mentioned here last week. Complaints of phishing by a Notion user resulted in their registrar pulling their domain name out of DNS.

Peter Judge — Datacenter Dynamics

  • As the title suggests, an article explaining that the failure of the collaboration app “Notion” to go offline for hours last week was caused by a phishing complaint.

What Is True Resilience? (Hint: It’s Not About Managing Risk)

Google has three guiding principles for improving resiliency:

* Create maximum observability of the overall system
* Design for effectiveness, not perfection
* Learn and iterate as you go

Will Grannis — Google

  • A Forbes article that explains the content of the title along the following three points.
    ○ Create maximum observability of the overall system
    ○ Design for effectiveness, not perfection
    ○ Learn and iterate as you go
  • The title throw is emphasized in the following words.
    ○ True resilience isn’t about managing a particular instance of risk, but being ready for anything through the way you operate.

4 Things you Need to Know about Writing Better Production Readiness Checklists

Better Production Readiness Checklists This is an awesome guide to writing a production-ready checklist — and why you’d want one.

Emily Arnott — Blameless

  • The following points are explained along with the title. Document creation can be improved if there are such guidelines.
    ○How to make a production checklist
    ○ Why production checklists are helpful
    ○ Keeping your checklist up to date
    ○ How Blameless can help integrate your checklists

Fix Fast for finding and fixing regressions

Facebook found that as a regression is discovered later, it will take much longer to deploy a fix. With a combination of heuristics and machine learning, they’re detecting regressions earlier and bringing them to the attention of folks that can fix them.

Jian Zhang and Brian Keller — Facebook

  • It describes Facebook’s challenges and a cross-sectoral effort called “Fix Fast” that was launched in 2019 to address these challenges.


KubeWeekly #252 February 26th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

CNCF Provides Insights into Secrets Management Tools with Latest End User Technology Radar


This week, CNCF announced the findings of the fourth CNCF Technology Radar, a guide to a set of emerging technologies based on the experience of the CNCF End User Community. The theme of this edition was secrets management, which was identified by the consumers of cloud native technologies as an essential technology to consider in cloud distributions.

  • It introduces that CNCF has released its fourth CNCF End User Technology Radar for “Secret Management”. The YouTube video of this Radar Team is also embedded, so check it out if you like.
  • The theme of Radar this time is the following four.
  1. Vault has the broadest adoption across many companies and industries.
  2. After Vault, groups tend to use the native solutions provided by their public cloud provider.
  3. Certificate manager has become a popular choice in the Kubernetes ecosystem.
  4. Other solutions in the space are fragmented across various levels of maturity and complexity.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Kubernetes admission controllers in 5 minutes

Kaizhe Huang, Sysdig

How we minimized the overhead of Kubernetes in our job system

Lally Singh and Ashwin Venkatesan, Datadog

  • As the title suggests, it describes how Datadog solved the performance degradation caused by overhead when migrating an existing job system to Kubernetes.

How we use Kubernetes at Asana

Tony Liang, Asana

  • It explains how to solve the problem with the “KubeApp” framework, which Asana tried to solve with Kubernetes and encountered some problems when using Kubernetes for the first time and built the framework to standardize the creation and maintenance of Kubernetes applications.

Cutting build time in half with Docker’s Buildx Kubernetes Driver

Jeremy Kreutzbender, Release

  • In line with the title, it explains first what the original infrastructure looked like and how long it took to build in the sample project, then the changes made and the observed speed improvements to use “buildx”.

Comparing Kubernetes operators for PostgreSQL

Nikolay Bogdanov, flan

  • It explains the outcome of their research with the most popular PostgreSQL operators, Stolon, Crunchy Data, Zalando, KubeDB, and StackGres, to meet client needs and implement managed solutions like RDS on Kubernetes.

Horizontal pod autoscaling

Puja Lower

  • To explain HPA (Horizontal Pod Autoscaling), before we dive into Kubernetes’ autoscaling method, it starts by defining the various types of scaling available via the API.

Sysdig contributes Falco’s kernel module, eBPF probe, and libraries to the CNCF

Loris Degioanni, Sysdig

  • It announces and explains the contribution of Sysdig’s Falco kernel module, eBPF Probe, and library to CNCF.
  • From now on, all core components of the Falco stack will be part of the CNCF.

Manage Envoy Proxy using go-control-plane

Mahendra Bagul, Infracloud

  • It explains how to manage Envoy using go-control-plane and what you need to do to gain a better understanding.

The road to adopting Kubernetes in development

Call Delnat

  • A guide for Mac OS to take the first step towards developing on a local Kubernetes cluster. It doesn’t explain how to set up the Kubernetes manifest, but it will help you overcome the first hurdle.

Cloud development environments: Using Skaffold and Telepresence on Kubernetes for fast dev loops(link correction)

Peter O’Neill, Ambassador Labs

  • It fixed a link to an article featured in KubeWeekly #250 two weeks ago. If you notice, please fix the original page as well. (When I went to see the link on page #250, I still got a 404 error)
  • I noticed that KubeWeekly’s previous edition below was uploaded to the web page. I said thank you to Saiyam Pathak, one of the editors of KubeWeekly at CNCF’s Ambassador, who checked for me.
    ○ 02/05/2021 — KubeWeekly # 249
    ○ 01/22/2021 — KubeWeekly # 247
    ○ 12/11/2020 — KubeWeekly # 243

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

CNCF End User Technology Radar, February 2021 — Secrets Management

  • Information on Webinar with the title mentioned in “The Headlines”. This Webinar video is from “Secret Management”, I put this link again because it is difficult to reach the video from the link destination.

The Container Security checklist

Liz Rice @Aqua Security

  • It outlines the checklist included in Liz Rice’s new Container Security book and details some potential weaknesses that you really need to avoid.

This Week in Cloud Native: Fluent Bit updates and Stream Processing

Fluent Bit

  • I can’t find the lead from the above link to the video. I found it uploaded to YouTube.
  • It explains performance updates in the latest release of Fluent Bit, long-awaited features such as multi-workers / new crypto libraries / GeoIP.

The Editorial

Articles, announcements, and more that give you a high-level overview of challenges and features.

The smallest Kubernetes Cluster: scaling down to the edge

Sascha Haase, Kubermatic

  • It introduces the open source Kubermatic Kubernetes Platform built by Kubermatic.
  • It keeps in mind use cases where edge computing unleashes millions of clusters, manages them, and raises questions about how to keep them in sync, even when the control plane and worker nodes are running in different locations.

Our take on internet access for virtual machines

Christian Bianchi, Giant Swarm

  • It considered several ways to provide Internet access to Azure virtual machines and explained that he believes NAT gateways are the perfect balance between cost, flexibility, ease of deployment, and customer friendliness.

Introducing GKE Autopilot: a revolution in managed Kubernetes

Drew Bradstock, Group Product Manager, Google Cloud

  • An article introduced after receiving the GA of GKE Autopilot. looks good.
  • The following keywords of the video embedded in the page had an impact for me.
    ○ Fully automated Kubernetes platform
    ○ Google is your node SRE

VIDEO: Is Kubernetes right for us?

Keith Townsend and Alex Ellis

  • Approximately one hour of discussion of the title content by Alex Ellis (Founder of OpenFaaS, CNCF Ambassador) and Keith Townsend (Co-Founder of The CTO Advisor).
  • It was interesting to see the different perspectives of the open source creator and the CTO advisor. Hobby vs Business Endeavor, spikes due to COVID-19 influences and local resource exhaustion of cloud providers.

Upcoming CNCF Online Programs

Rethinking your company’s Cloud Security in the shadow of the SolarWinds attack
Amir Kaushansky and Leonid Sandler @ARMO
March 4, 2021
Register Now

This Week in Cloud Native (Livestream): Demystifying Kubernetes network policy
Thomas Graf @Isovalent
March 3, 2021 at 12:00 pm PT
Register Now

CNCF Online Programs Playlist on YouTube
Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store