SRE / DevOps / Kubernetes Weekly Collection#35(Week 40)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #509 September 27th, 2020
SRE Weekly Issue #237 September 27th, 2020
KubeWeekly #235 October 2nd, 2020

DEVOPS WEEKLY ISSUE #509 September 27th, 2020


A post on one open source project’s search for a new CI system. Lots of useful research for anyone investigating CI and build systems.

  • The title is “Rebuilding Linkerd’s continuous integration (CI) with Kubernetes in Docker (kind) and GitHub Actions”.
  • An article transcribing his presentation at KubeCon EU 2020.
  • Two demo videos are embedded, which is especially helpful when you want to assemble CI / CD with OSS.

A good post on the evolution of systems administration, the embrace of devops, shifting responsibility and changing job titles.

  • The title is “The evolution of DevOps and why we are here”.
  • An article that explains the role of system administrators, how they needed to change their responsibilities to what they call today’s DevOps engineers over time, and what they are today.

The idea of commercial off-the-shelf software is fine, but the reality often differs. This post explains the frustration and suggests it’s only COTS if you can get something up and running in a day.

  • The title is “Fake COTS “and the one-day rule”.
  • When a government agency procures commercial-off-the-shelf (COTS), it can be used immediately as expected, and a product that loses its name and cannot be used immediately in one day is called “Fake COTS”. It mentions the product name as an example.
  • The following disclaimer is strongly stated at the beginning.
  • Extra-prominent disclaimer: The views expressed here are my own. Products mentioned in the examples below are not endorsements.

Dependency management is an unfortunate consequence of a library ecosystem. This post, reviewing a recent paper looking at the Python ecosystem, has some interesting ideas and tooling demonstrations.

  • The title is “Watchman: monitoring dependency conflicts for Python library ecosystem”. A blog that explains recent papers.
  • It explains the prevalence of dependency conflicts in Python projects and their causes, touching on “Dependency Hell”.

A look at Terraspace, a deployment tool for Terraform that provides some interesting high-level features and visualisation tools.

  • The title is “Terraspace All: Deploy Multiple Stacks or Terraform Modules At Once”.
  • It introduces Terraspace , the framework of Terraform. It provides conventions from organized structures and settings, keeps the code DRY, and adds useful tools.
  • A video is also embedded.

AWS has grown to have several different overlapping approaches to managing users and accounts. This post looks at some of the nuance and makes some recommendations.

  • The title is “AWS Account Structure: Think twice before using AWS Organizations”.
  • It explains the transition and points of AWS account management methods. The podcast is also embedded.

Krew, the plugin manager for kubectl, now supports custom indexes. So you can distribute kubectl plugins for your own projects or for internal company usage.

  • The title is “Using Custom Plugin Indexes”.
  • It explains how to use the “Custom Plugin Index” with some kubectl krew commands.


version-checker is a Kubernetes utility for observing the current versions of images running in the cluster, as well as the latest available upstream. These checks get exposed as Prometheus metrics to be viewed on a dashboard.

  • The GitHub page of the Kubernetes utility “Tversion-checker” for monitoring the current version of images running in a cluster and the latest upstream available one.
  • These checks are published as Prometheus metrics displayed on the dashboard, or as soft alert cluster operators.
  • This tool is currently experimental.

Portus is an open source authorization service that sits atop a Docker container registry. It provides user management and a number of other useful features.

  • The io page of “Portus”, an open source authorization service and user interface for the next generation Docker container registry.
  • An on-premise app that allows users to manage and protect the Docker container registry.

SRE Weekly Issue #237 September 27th, 2020


Postmortem — why Allegro went down

They fully expected their deep-discount sale to drive traffic, but they didn’t expect their system to handle the increase in the way that it did.

Michał Kosmulski — Allegro

  • An article dated August 31, 2018. Allegro’s post-mortem on an outage where the website went down for 20 minutes at noon on July 18, 2018.
  • It is based on an internal post-mortem, it publishes and explains how the failure occurred for customers and the tech community, and what technical steps were taken to prevent similar events in the future.

Zero-Downtime Kubernetes Deployments

Pre-stop hooks, liveness probes, and readiness probes were key to smoothly transitioning their services from a home-grown container system to Kubernetes.

Oliver Leaver-Smith — Sky Betting & Gaming

  • Sky Betting & Gaming describes the work that it has done over the past few months to move the OIDC / OAuth2 ID service from a tactical container platform to an on-premises Kubernetes cluster.

Feelings during incident response

The experience of responding to an incident can evoke emotions that run the gamut.

Mads Hartmann

  • It excerpts and explains the part of the Glitch podcast “ Shift Shift Forward “ that was asked about “Feelings during incident response”.

Join SRE Classroom NALSD workshops

Google has released course materials the first of a series of classes on NALSD (“non-abstract large systems design”). This first one is about a distributed Pub-Sub system.

Auithor: Jenny Liao and Salim Virji — Google

Why you should write up your own incident

Usually, doing a post-analysis on an incident you were in is an anti-pattern because you’re likely to introduce bias. But sometimes, it can lead you to learn more than you would have otherwise.

Lorin Hochstein

  • “Why you should write up your own incident” is explained from the author’s latest experience.
  • He states that the person responding to the incident should avoid conducting a post-analysis, but it is unavoidable if there is a need. In addition, he recommends that the person with the incident perform the post-analysis based on his own experience of new discoveries in the post-analysis of the incident that he recently responded to.
  • You shouldn’t write up your own incident if you can avoid it. To write up an incident well, you need to be able to capture the perspectives of the different people who were involved. If the write-up author was also one of the responders, then the writeup will be biased towards their perspective, at the expense of capturing the perspectives of the other engineers who were engaged.


KubeWeekly #235 October 2nd

The Headlines

Editor’s pick of the highlights from the past week.

KubeCon + CloudNativeCon North America 2020 Virtual — schedule now available!We’re so excited to announce that the schedule for KubeCon + CloudNativeCon North America 2020 Virtual is live! The fourth virtual event from CNCF this year will host ~200 maintainer sessions, tutorials, keynotes, and breakout sessions, including insights from end users on cloud native technology in production. This educational event will arm attendees — from beginner to advanced — with the insights they need to successfully implement and manage cloud native architectures within their organization. Don’t forget that you can save $25 when you register by the end of October!

  • It announces KubeCon + CloudNativeCon North America 2020 Virtual schedule release and reminds us of its registration. The Paid participation is $ 75, a $ 25 discount during October, and $ 100 for November applications.
  • It’s already next month. I will choose the sessions to watch in advance.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Project webinar: Kubernetes 1.19

Kubernetes release team

  • The changes in Kubernetes 1.19 are explained based on the statuses “Alpha”, “Beta”, “Stable”, etc. for each SIG. Support period is one year from 1.19.

CNCF Member webinar: VanillaStack as a platform for a truly vendor-agnostic open-source ecosystem

Karsten Samaschke, CEO @Cloudical

  • It introduces the open source version of VanillaStack and its underlying ideas, explains the ideas behind the platform, and provides a future roadmap for the integration and deployment of open source projects.

CNCF Member webinar: Effective disaster recovery strategies for Kubernetes

Rasheed Amir, CEO tStakater AB

  • It describes how companies are leveraging Kubernetes through DevOps for mission-critical cloud-native apps with the following points.
    ○ Some concepts and terms to consider for disaster recovery business needs
    ○ Kubernetes architecture for ensuring fault tolerance and high availability
    ○ Factors to consider while creating a Disaster recovery plan
    ○ The components for which to implement backup and restore

CNCF Member webinar: Self service Kubernetes for enterprises

Jim Bugwadia, Founder and CEO @Nirmata

  • It describes best practices and new patterns that can help you achieve a self-service Kubernetes cluster across your enterprise.
  • For platform teams that require visibility and governance, enable enterprise-wide business agility, and drive adoption of cloud-native tools.

CNCF Member webinar: Dapr, Lego for microservices

Mark Chmarny, Principal Program Manager @Microsoft

It introduces how to use the distributed application runtime Dapr to efficiently build cloud-native apps deployed on Kubernetes and other hosting platforms.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

The Level Up Hour (podman play kube)

Langdon White and Chris Short, Red Hat

  • The Podman Twitch video program “The Level Up Hour” by Chris Short of Red Hat, the editor of Kube Weekly, and with Langdon White of the company as a guest.

Our online analytical processing journey with ClickHouse on Kubernetes

Sudeep Kumar, Mohan Garadi, Xiancheng Li, Amber Vaidya and Liangfei Su, eBay

  • It describes the latest evolution of online analytical processing(OLAP) data under the theme of Click House(column-oriented database) using Kubernetes.

A Linux sysadmin’s introduction to cgroups

Steve Ovens, Red Hat

  • The first article in a four-part series. It describes the definitions of cgroups and how they can help with resource management and performance tuning.

Rabbitmq monitoring is in the Governments

Piotr Minkowski

  • It explains how to run a monitoring stack in Kubernetes using RabbitMQ.
  • You can use the RabbitMQ monitoring tool to see general metrics of the nodes and detailed logs of all messages.
  • Spring Boot AMQP provides application-specific metrics that interact with RabbitMQ.

Build a data streaming pipeline using Kafka Streams and Quarkus

Kapil Shukla, Red Hat

  • It builds and describes Quarkus apps that use KafkaStreams to stream and process data in real time.

Chaos Mesh 1.0: Chaos Engineering on Kubernetes made easier

Chaos Mesh Maintainers

  • The announcement of the release of v 1.0 GA of “Chaos Mesh®”, which joined CNCF as a sandbox project in July 2020, and it introduced the outline of this project.

Rootless containers with Podman: The basics

Prakhar Sethi, Red Hat

  • It shows the benefits of using containers and Podman, the rootless containers and why they are important, and examples of how to use rootless containers with Podman.

sink-proj / sink

Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow

  • The GitHub page of “Couler”, an integrated interface for building and managing workflows in various workflow engines such as Argo Workflow, Tekton Pipeline, and Apache Airflow. The following is provided.
    ○ Simplicity: Unified interface and imperative programming style for defining workflows with automatic construction of directed acyclic graph (DAG).
    ○ Extensibility: Extensible to support various workflow engines.
    ○ Reusability: Reusable steps for tasks such as distributed training of machine learning models.
    ○ Efficiency: Automatic workflow and resource optimizations under the hood.


Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics

  • I will skip it because it is covered in DEVOPS WEEKLY ISSUE # 509 above.

TiDB Operator: Your TiDB operations expert in Kubernetes

Aylei Wu, PingCap

  • It explores how TiDB Operators can run TiDB smoothly on Kubernetes and ensure data security, and explain how companies use TiDB Operators in production and best practices.

GitHub actions demystified

Pooja Dhoot

  • It shares a workflow for creating a pipeline for deploying fission on a GKE Kubernetes cluster created by GitHub Actions, some code validation actions, and finally some monitoring actions.

Use Terraform to create and manage a HA AKS Kubernetes cluster in Azure

Kentaro Wakayama, Coder Society

  • It explains how to use Terraform to manage a highly available Azure AKS Kubernetes cluster with Azure AD integration and Calico network policies enabled.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Contributing to the Development Guide

Erik L. Arneson

  • A new contributor describes his experience writing and submitting changes to the Kubernetes Development Guide.
  • I’m also starting to get involved in the localization of web page documentation, but I’d like to expand the scope.

Anthos in depth: Easy load balancing for your on-prem workloads

Mahesh Narayanan, Product Manager, GKE and Yuan Liu, Software Engineer, GKE

  • It introduces three different options that Anthos offers to deploy an external load balancer and details the load balancer bundled with Anthos.

Kubernetes: When to use, and when to avoid, the operator pattern

Mary Branscombe, The New Stack

  • It quotes Rancher Labs Chief Technology Officer Darren Shepherd’s tweet and explores the usage and points of the Operator.

Leader Election, with Mike Danese

Adam Glick and Craig Box, Kubernetes Podcast from Google

Security in all its forms — detection of undesirable behavior thanks to Falco with Thomas Labarussias

Electro Monkeys podcast (in French)

  • The French podcast “Electro Monkeys podcast” seems to cover the mechanism by which Falco detects unwanted behavior.

The Cloud Native Landscape: The runtime layer explained

Catherine Paganini and Jason Morgan

  • A series of articles explaining each category of CNCF’s “Cloud Native Landscape”. It focuses on a runtime layer that covers everything a container needs to run in a cloud-native environment.

With an eye toward standardization and security for its media brands, Verizon Media turned to cloud native

CNCF Case Study

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Multi-Cluster & multi-cloud service mesh with CNCF’s Kuma and Envoy
Marco Palladino, CTO & Co-Founder @Kong
Oct 6, 2020 10:00 AM Pacific Time

Member Webinar: The evolution of cloud orchestration systems from ephemeral to persistent storage
Boyan Krosnov, CPO @StorPool
Oct 7, 2020 8:00 AM Pacific Time

Member Webinar: Kubernetes native two-level resource management for AI/ML workloads
Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research
Oct 7, 2020 10:00 AM Pacific Time

Member Webinar: Building dynamic machine learning pipelines with KubeDirector
Tom Phelan, Fellow, Software Organization @Hewlett Packard Enterprise
Oct 8, 2020 10:00 AM Pacific Time

Member Webinar: You can be a Kubernetes contributor too!
Jeremy L. Morris, Software Engineer @DigitalOcean
Oct 13, 2020 10:00 AM Pacific Time

Member Webinar: A full application environment for every PR–before you merge to master!
Vishal Biyani, CTO @InfraCloud
Jono Spiro, Staff Software Engineer, Engineering Operations @OpenGov
Oct 14, 2020 10:00 AM Pacific Time

Member Webinar: GitOps at scale for a multicloud, multi-region stateful application
Rick Spencer, Head of Platform @InfluxData
Oct 14, 2020 1:00 PM Pacific Time

Member Webinar: S&P experience report: multi-cloud serverless on Knative
Evan Anderson, Software Engineer @VMware
Mark Wang, Head of Cloud Engineering @S&P Global Ratings
Oct 15, 2020 10:00 AM Pacific Time

Member Webinar: Delivering cloud native apps to Kubernetes using werf
Dmitry Stolyarov CTO, @Flant
Oct 16, 2020 10:00 AM Pacific Time

Member Webinar: How to migrate NF or VNF to CNF without vendor lock-in
Grzegorz Sikora, VP Business Development @OVOO
Oct 20, 2020 10:00 AM Pacific Time

Member Webinar: Deploying Kubernetes to bare metal using cluster API
Seán McCord, Principal Senior Software Engineer @Talos Systems, Inc.
Oct 21, 2020 1:00 PM Pacific Time

Member Webinar: K8s audit logging deep dive
Randy Abernethy, Managing Partner @RX-M
Oct 22, 2020 10:00 AM Pacific Time

Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer
Francisco Perez, Senior Backend Engineer
Oct 23, 2020 10:00 AM Pacific Time

Member Webinar: Admission controllers: one part of your Kubernetes security and governance toolkit
Gunjan Patelm, Cloud Architect @Palo Alto Networks
Robert Haynes, Cloud Security Evangelist @Palo Alto Networks
Oct 28, 2020 7:00 AM Pacific Time

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store