SRE / DevOps / Kubernetes Weekly Collection#34(Week 39)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #508 September 20th, 2020
SRE Weekly Issue #236 September 20th, 2020
KubeWeekly #234 September 25th, 2020

DEVOPS WEEKLY ISSUE #508 September 20th, 2020


Describing policy (or in fact configuration in general) in machine-readable form quickly gets into a conversation over whether you should prefer data, a general programming language or a DSL. This post does a good job of explaining why.

  • The title is “Anatomy of a Rule”.

An excellent post on moving to alerts based on service-level objectives, SLOs. Covers the why and how, based on documents used internally to make the case for the change.

  • The title is “Alerting on SLOs”.

A discussion of the need to test in production and an introduction to the dark canary pattern for doing so safely.

  • The title is “Production testing with dark canaries”.

A look at a tool-agnostic architectural framework for building resilient systems, focused around predictability, observability, recoverability and keeping things simple.

  • The title is “PORK: A Technology Resilience Framework”.

A look at a range of Kubernetes local clients/dashboards including Octant, Kubenav, Lens and more.


Even with all the talk of cloud native, it’s still super useful for lots of roles to have a solid grounding in UNIX programming. This Advanced Programming in the UNIX Environment course is not available completely online.

  • It is a programming course named “Advanced Programming in the UNIX Environment”.

Have you ever wanted to write Python inside your SQL queries? Well now you can with Postgres using PL/Python. These post act as an introduction and show off some interesting demos with embedded numpy.

  • The titles are “Getting Started with Postgres Functions in PL / Python (link above)” and “ Exploring PL / Python: Turn Postgres Table Data Into a NumPy Array “.

An introduction to Open Policy Agent Gatekeeper, specifically looking at addressing issues with the built-in pod security policies feature in Kubernetes.

  • The title is “Using Gatekeeper as a drop-in Pod Security Policy replacement in Amazon EKS”.

GitHub Actions is still relatively new, but there is already a huge amount of content available for it. This post looks at various actions for analyzing code for security problems.

  • The title is “GitHub Actions for Security Code Analysis”.


Terratag is a new CLI tool that enables users of Terraform to automatically create and maintain tags across their entire set of AWS, Azure, and GCP resources

  • The GitHub page of the OSS CLI tool Terratag, which allows you to apply tags or labels to the entire set of Terraform files.

SRE Weekly Issue #236 September 20th, 2020


My first outage

A nice juicy post-incident report from the archives. Remember the first time you took down production?

Mads Hartmann — Glitch

  • A retrospective article when the author first caused a failure with his own hands in the production environment of

Fault during testing of NordLink

While testing a new power transmission link, it was accidentally overloaded by a factor of ~14x, with far-reaching but ultimately well-managed effects.

Thanks to Jesper Lundkvist for this one.

  • An article about a failure that occurred during a test drive of the NordLink project.

Throughput autoscaling: Dynamic sizing for

As Facebook moved from a static to an auto-scaled web pool, they had to try to predict their expected demand as accurately as possible.

Daniel Boeve, Kiryong Ha, and Anca Agape — Facebook

  • A Facebook article explaining throughput autoscaling with one of the main services, “Web Tier” which handles HTTP requests from people who use their service each time they interact with Facebook.

Database migrations lessons learned

The key lesson involves ensuring that your migrations avoid using parts of the production code, which could cause their action to change down the line inadvertently.

Frank Lin — Octopus Deploy

  • It introduces database migration and shares the following five lessons learned, several common frameworks and the author’s nearly a decade of experience.
  1. Keep your migration scripts away from your production code

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps

Cloudflare uses an interesting multi-layered approach to mitigating attacks.

Omer Yoachimik — Cloudflare

  • It explains that the DDoS attack that occurred on July 3 was automatically detected and mitigated by Cloudflare’s global DDoS protection system Gatebot.

Availability, Maintainability, Reliability: What’s the Difference?

The availability/reliability distinction in this article is thought-provoking.

Emily Arnott — Blameless

  • “What does reliability mean?” To answer this question, it classifies “reliability” from the perspectives of “availability” and “maintainability”, which are other indicators of SRE as follows:

Troubled Times: Episode 3

2020 has shown the value of adaptive capacity. 2021 will show whether or not adaptive capacity can be sustained.

This article (not a video or podcast despite the name) also focuses on the increasing importance of learning from incidents.

Dr. Richard Cook — Adaptice Capacity Labs

  • The following four crises related to the current situation are listed and explained while focusing on the interaction between the four and the resilience of society.
  1. Covid-19 pandemic

Building and revising adaptive capacity sharing for technical incident response: A case of resilience engineering

What is resilience engineering? What does a resilience engineer do? Are there principles of resilience engineering? If so, what are they? What makes it possible to engineer resilience?

This academic paper uses a case study to show how a company engineered the resilience of their system in response to a series of incidents.

Richard I. Cook and Beth Adele Long — Applied Ergonomics

  • It describes some of the candidate features and conditions observed in certain cases of resilience engineering. When I read these papers, I think, “As an engineer, I would like to clarify and dig deeper into my specialty and theme like this.”


  • Google Drive
    This is a post-analysis for two outages, one from this past week and the other from the week before.

KubeWeekly #234 September 25th

The Headlines

Editor’s pick of the highlights from the past week.

KubeCon + CloudNativeCon Europe 2020 — Virtual Conference Transparency Report: A very successful first virtual event!

CNCF staff

The shift to a virtual KubeCon + CloudNativeCon EU wasn’t easy or even expected, but the community came together to share knowledge, learn about new projects, and play drag queen bingo. The KubeCon + CloudNativeCon transparency reports provide insight into event attendance, diversity and inclusion, and drills into the talk section process for the events, which is run by the event co-chairs and their program selection committee.

YAML Templating Solutions: Helm & Kustomize


Writing config files by hand is like coding with Notepad instead of an IDE. There are ways to automate most of it, and this usually starts with either Helm or Kustomize. This article presents a 101-level overview of both, and helps in choosing which one’s the better fit for your use case.

  • An explanation video of YouTube is embedded and transcribed.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: Critical DevSecOps considerations for multi-cloud Kubernetes

Sylvain Huguet, Sr. Product Manager — Karbon/Kubernetes @Nutanix & Loris Digioanni, CTO & Founder @Sysdig

  • Two cloud-native experts in infrastructure and security provide valuable insights on the following:

CNCF Member webinar: Mitigating Kubernetes attacks

Wei Lien Dang, Head of Strategy @StackRox

  • The following points provide recommendations for protecting your cloud, on-premises, and hybrid Kubernetes deployments.

CNCF Member webinar: Using KubeVirt in telcos

Abhinivesh Jain, Distinguished Member of Technical Staff @Wipro

  • It describes the relevance of KubeVirt to telcos, focusing on current limitations and challenges from a telco adoption perspective.

CNCF Member webinar: AWS controllers for Kubernetes — AWS services, now kubified!

Jay Pipes, Principal Open Source Engineer @Amazon Web Services

  • A video explaining the design and usage of ACK (AWS Controllers for Kubernetes) by one of the creators of this in AWS.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Ingress for Anthos — Multi-cluster ingress and global service load balancing

Gokul Chandra

  • An article explaining the Google cloud-hosted multi-cluster ingress controller “Ingress for Anthos” for Anthos GKE clusters. It is amazing that this writer explains and illustrates from the reader’s point of view so that the readers can understand it very carefully and hands on by themselves.

Installing Kubernetes Metrics Server securely

Neil Wilson, Brightbox

  • It explains some methods and points to install Kubernetes Metrics Server securely.

How we moved to Github-based Kubernetes config management

Benjamin Yolken, Segment

  • An article by introducing the history of Github-based Kubernetes config management in line with the release of “kubeapply”, a lightweight tool for git-based management of Kubernetes configurations.

GSoC 2020 — Building operators for cluster addons

Somtochi Onyekwere

  • The story of the author participating in the Google Summer of Code and contributing to the cluster addons of Kubernetes.

Detecting CVE-2020–14386 with Falco and mitigating potential container escapes

Kaizhe Huang, Sysdig

  • The explanation focuses on the contents of CVE-2020–14386, which was reported as severity “high” on 2020/09/04, and the detection method by Falco and Sysdig Secure.

Containing a real vulnerability

Fabricio Voznika, gVisor

  • Following the announcement of the above vulnerability (CVE-2020–14386), gVisor is not vulnerable to this particular issue, but provides an interesting case study to continue gVisor’s security investigation.

Yes, you can run VMs on Kubernetes with KubeVirt

Bryant Son, Red Hat correspondent

  • It explains how to use KubeVirt via the locally runnable open source Kubernetes platform “Minikube”.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Grafana, with Torkel Ödegaard

Craig Box and Adam Glick, Kubernetes Podcast from Google

CommunityBridge Spotlight: Get the most out of the CommunityBridge program

Sonia singla

  • The author, who graduated from the “Community Bridge Program” with the Thanos community of the Linux Foundation, describes its experience and suggestions for future Community Bridge internships with plenty of emojis in order to make the most of its internship.

Cloud native ecosystem feels COVID-19 crunch

Dan Meyer, SDxCentral

  • As an example, the release of Kubernetes 1.19 shows that the ongoing COVID-19 pandemic is also affecting software developers and the cloud-native community, who tend to be seen as isolated environments that appear to be unaffected by the outside world. It is explained with examples such as “It was postponed to the end of August” and “The term forcing function used by Mr. Kelsey Hightower”.

DevOps 049: DevOps, Open Source, and OpenShift with Chris Short

Adventure in DevOps Podcast

  • DevOps-themed podcast.

Ask the Product Manager Office Hours: Top 5 problems with Kubernetes and how we are fixing them

Mike Barrett and Chris Short, Red Hat

  • A YouTube video of Red Hat’s OpenShift team. Chris Short, editor of Kube Weekly and Mike Barrett (Senior Director of Product Management) talk about the title. The story of Mike’s career path at the beginning was also interesting.

Air Force to demo updating software on a jet in flight, official says

Mila Jasper, Nextgov

  • Nextgov article from Government Executive Media Group.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: VanillaStack as a platform for a truly vendor-agnostic open-source ecosystem
Karsten Samaschke, CEO @Cloudical
Sept 29, 2020 10:00 AM Pacific Time

Member Webinar: Effective disaster recovery strategies for Kubernetes Rasheed Amir, CEO @Stakater AB
Sept 30, 2020 7:00 AM Pacific Time**

Member Webinar: Self service Kubernetes for enterprises
Jim Bugwadia, Founder and CEO @Nirmata
Sept 30, 2020 10:00 AM Pacific Time

Member Webinar: Dapr, Lego for microservices
Mark Chmarny, Principal Program Manager @Microsoft
Oct 1, 2020 10:00 AM Pacific Time

Member Webinar: Transactional microservices — The final frontier
Daniel Kozlowski, Minister of Engineering @PlanetScale
Oct 2, 2020 10:00 AM Pacific Time

Member Webinar: Multi-Cluster & multi-cloud service mesh with CNCF’s Kuma and Envoy
Marco Palladino, CTO & Co-Founder @Kong
Oct 6, 2020 10:00 AM Pacific Time

Member Webinar: The evolution of cloud orchestration systems from ephemeral to persistent storage
Boyan Krosnov, CPO @StorPool
Oct 7, 2020 8:00 AM Pacific Time

Member Webinar: Kubernetes native two-level resource management for AI/ML workloads
Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research
Oct 7, 2020 10:00 AM Pacific Time

Member Webinar: Building dynamic machine learning pipelines with KubeDirector
Tom Phelan, Fellow, Software Organization @Hewlett Packard Enterprise
Oct 8, 2020 10:00 AM Pacific Time

Member Webinar: A full application environment for every PR–before you merge to master!
Vishal Biyani, CTO @InfraCloud
Jono Spiro, Staff Software Engineer, Engineering Operations @OpenGov
Oct 14, 2020 10:00 AM Pacific Time

Member Webinar: S&P experience report: multi-cloud serverless on Knative
Evan Anderson, Software Engineer @VMware
Mark Wang, Head of Cloud Engineering @S&P Global Ratings
Oct 15, 2020 10:00 AM Pacific Time

Member Webinar: How to migrate NF or VNF to CNF without vendor lock-in
Grzegorz Sikora, VP Business Development @OVOO
Oct 20, 2020 10:00 AM Pacific Time

Member Webinar: Deploying Kubernetes to bare metal using cluster API
Seán McCord, Principal Senior Software Engineer @Talos Systems, Inc.
Oct 21, 2020 1:00 PM Pacific Time

Member Webinar: K8s audit logging deep dive
Randy Abernethy, Managing Partner @RX-M
Oct 22, 2020 10:00 AM Pacific Time

Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer
Francisco Perez, Senior Backend Engineer
Oct 23, 2020 10:00 AM Pacific Time

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store