SRE / DevOps / Kubernetes Weekly Collection#53(Week 5, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #527 January 31st, 2021
SRE Weekly Issue #255 January 31st, 2021
KubeWeekly #249 February 5th, 2021

DEVOPS WEEKLY ISSUE #527 January 31st, 2021


A detailed writeup and timeline of a security incident. It’s important to learn from events like this, and the post finishes up with some concrete recommendations.

  • The title is “A deeper dive into our May 2019 security incident”.

A discussion of various facets of building reliable systems. From SLOs to runbooks, service catalogues to feature flags and more.

  • The title is “2021 is the Year of Reliability”.

A post describing the high-level security architecture for a web site. What makes it interesting is the focus on proportionality, being as secure as needed rather than as secure as possible,

  • The title is “Securing the NCSC’s web platform”.

A good post on the risks associated with permissive permissions and privilege escalation with Kubernetes pods.

  • The title is “Bad Pods: Kubernetes Pod Privilege Escalation”.

Why does it take so long to build software? Lots of observations in this post, about accidental complexity, about increasing demands vs 10 or 20 years ago on several fronts, on the rise of frameworks and tools that solve problems you might not have and more.

  • The title is “Why does it take so long to build software?”.

Hex is the package manager for Erlang and Elixir. A new feature, called Hex Preview, allows for checking the contents of source files from specific versions of packages. An important use case that’s easy to miss with the growth of supply chain attacks.

  • The title is “Introducing Hex Preview”.

An example of writing unit tests for Helm charts using Go.

  • The title is “How to unit-test your helm charts with Golang”.


An event all about the low-level bits of containers. Container runtimes, image building, image scanning, container security and isolation, virtualization inside containers, etc. Taking place March 9th/10th, with the CFP submissions due by the 10th of February.

  • It introduces “The Container Plumbing Days”, a two-day ‘lower-level’ open source container technologies event.


Simdjson is a JSON parsing library that aims to make parsing gigabytes of JSON per second trivial. Interesting design and API, and bindings available in lots of languages.

  • A GitHub page of the library “simdjson” that parses JSON at high speed.

Etok provides a nice user interface for running Terraform on Kubernetes. Avoid needing local credentials or access to APIs. Some handy integration with GCP as well.

  • A GitHub page of the tool named “Etok”, which stands for Execute Terraform On Kubernetes.

Litestream is a standalone streaming replication tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file or S3.

  • A GitHub page of “Litestream”, a standalone streaming replication tool for SQLite.

SRE Weekly Issue #255 January 31st, 2021


Why It Should Be Service, Not Site Reliability

It really should! Even Google is much more accurately described as a “service” than a “site”.

Chris Riley — Splunk

  • S of SRE stands for “site”, but it argues that it should be a “service” that is more consistent with what developers offer today, along with the following points:
    ○ Subscription-based business models
    ○ Application architectures
    ○ Modern delivery chain
    ○ Cross-platform
    ○ Customer-centric

Migrations: the sole scalable fix to tech debt.

There are migrations, and then there’s the time between migrations.

Will Larson

  • The title tells the story of Uber moving from a Puppet managed service to a fully self-service provisioning model.

2021 is the Year of Reliability

2020 was the year mainstream folks realized how important reliability is. Will overall reliability improve in 2021?

Robert Ross — FireHydrant

  • I will skip it, because it is covered in DEVOPS WEEKLY ISSUE#527 above.

This SRE attempted to roll out an HAProxy config change. You won’t believe what happened next…

I love this for the click-bait title and the content. An HAProxy feature designed for HA had a surprising and unexpected behavior.

Andre Newman — GitLab

  • It details what we discovered while investigating strange behavior from HAProxy.

Tyler Wells on building a culture of reliability at Twilio

Twilio builds customer trust through a reliability culture, customer empathy, and accountability.

Andre Newman — Gremlin

  • The following points are excerpted from a talk by Tyler Wells, Senior Director of Engineering at Twilio.
    ○ Reliability is built on customer trust
    ○ Culture
    ○ Customer empathy
    ○ Accountability
    ○ Reliability is a journey

WTF is SRE WTFinar

This WTFinar tackles the beginning of understanding SRE. It focuses on service level indicators (SLIs) and service level objectives (SLOs) — components of error budgets.

Container Solutions

  • A Container Solutions’ Webinar “WTF is SRE?” is featured.


KubeWeekly #249 February 5th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Welcome to our 5 new TOC members!

Chris Aniszczyk, CNCF

Help us give a warm welcome to the newest members of the TOC:

* Erin Boyd, Apple
* Cornelia Davis, Weaveworks
* Lei Zhang, Alibab
* Dave Zolotusky, Spotify
* Ricardo Rocha, CERN

Learn more about the TOC and newest members in the latest blog post.

  • CNCF’s TOC(Technical Oversight Committee) an article announcing the selection of the above five new TOC members.

Cloud Native Computing Foundation Announces Open Policy Agent Graduation

CNCF blog

Congratulations to Open Policy Agent (OPA) for hitting graduated status! OPA has demonstrated widespread adoption, an open governance process, feature maturity, and a strong commitment to community, sustainability, and inclusivity to graduate.

  • As the title suggests, the article tells the CNCF TOC that OPA’s Maturity Level has reached Graduation.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Killing Containers at Scale

Connor Brewster, Replit

  • As a result of their research, it explains how to forcibly terminate the container by themselves and its effect for the problem that “Docker takes more than 30 seconds to forcibly terminate all containers on the VM”.

Kubernetes — How to Debug CrashLoopBackOff in a Container

David Giffin, Release App

  • It doesn’t explain how to properly configure k8, but instead focuses on debugging its own and other code when a “CrashLoopBackOff” error occurs in the container.

Hunting for Malware with Falco

And Lorenc

  • It explains how to build a platform to look for malicious behavior hidden behind the scenes.

Deliver your applications to edge and IoT devices in rootless containers

Ilkka Tengvall, Red Hat

  • It explains how to use systemd, Podman, and Red Hat Ansible Automation to automate software and push it as a container to small edge and Internet of Things (IoT) gateway devices.

Building a Kubernetes CI/CD Pipeline with GitLab and Helm

Dan Slapelis, Nextthink Labs

  • It explains how to use the CI/CD pipeline on Kubernetes as a puzzle, bolt the continuous delivery(CD) pieces of the puzzle, build the CI/CD pipeline, and deploy the app to Kubernetes. As a premise, It starts with the explanation of Helm, which is an important part of the puzzle.

Kubernetes vs Docker: Understanding Containers in 2021

Tomas Fernandez, semaphore

  • A few weeks ago, the Kubernetes development team announced that they would deprecate Dockershim, but the most common questions are explained from the underlying containers, Docker, and Kubernetes.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

CNCF On-demand webinar: Policy as Code to manage security risk in K8s before & after deployment

Cesar Rodriguez @Accurics

  • It introduces the on-demand webinar with the above title. If you are interested, please register and watch. It is open to the public only for registrants, and the release period is February 4, 2021 0:00 — February 10, 2021 23:59 (PST).

This Week in Cloud Native (Livestream): Kubernetes Policies-as-Code

Jim Bugwadia @Nirmata

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Backstage, with Lee Mills and Matt Clarke

Craig Box, Kubernetes Podcast from Google

  • Kubernetes Podcast by Google employees. The current Co-host is Craig Box. Adam Glick goes to greener pastures. Past guests will be invited as guest hosts for several weeks.

Release Orchestration Introduces Research Report The 2021 State of Cloud-Native

  • In line with the report content in the title, it describes challenges, trends, and opportunities for improvement regarding software release and verification in production as of 2021.

Upcoming CNCF Online Programs

CNCF Live webinar: How to Manage Kubernetes Application Life Cycle Using Carvel
presented by VMware
February 9, 2021 at 10:00 am PT
Register Now

CNCF On-demand webinar: Debugging Kubernetes On The Fly
presented by Rookout
February 11, 2021
Register Now

CNCF On-demand webinar: Otomi Container Platform Open Source Announcement
presented by Red Kubes RV
February 11, 2021
Register Now

For more information, please visit our updated Online Programs page.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store