SRE / DevOps / Kubernetes Weekly Collection#84(Week 36, 2021)

Yoshiki Fujiwara
10 min readSep 11, 2021
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #558 September 5th, 2021
SRE Weekly Issue #286 September 5th, 2021
KubeWeekly #276 September 10th, 2021

DEVOPS WEEKLY ISSUE #558 September 5th, 2021


A good introduction to error budgets and using them to make trade-offs between risk and stability.

  • The title is “Data-driven negotiation with SLIs, SLOs and Error Budgets (2/2)”. It describes how to use error budgets to negotiate in a data-driven way, the trade-offs between innovation and reliability, and between risk and stability.
  • Click here for the previous post. With the end goal of providing the optimum level of software reliability that maximizes user happiness, SLI and SLO are used to capture user happiness in a metric form.

An interesting post on modern ransomware and malware attacks and now to mitigate and deal with the fallout.

  • A set of two articles with the above content. The title of the first link above is “The rise of ransomware”. As an insight into the trends that have emerged while helping organizations respond to ransomware attacks, they provide how ransomware has evolved in two ways:
    ○ hybrid business models for monetisation
    ○ increasingly sophisticated (and targeted) methods of deployment
  • The second article is “Mitigating malware and ransomware attacks”. To address the effects of malware (including ransomware), private and public sector organizations provide actions to help prevent malware infections and steps to take if they are already infected.

A handy pattern for when you need to create new repositories based on a template and some variables, using GitHub repository templates, Actions and the python cookiecutter tool.

A look at the security profile operator for Kubernetes. This exposes a first-class interface for configuring seccomp profiles amongst other useful features including exposing metrics and enriching logs.

  • The title is “Managing Kubernetes seccomp profiles with security profiles operator”.
  • Since it was covered in KubeWeekly #275 last week, I will skip it.

The first two parts of a series on building an analytics platform based on Druid. Background on technology choice and lots of technical details about the implementation.

  • As mentioned above, a series of articles that describe Analytics as a Platform on Pinterest’s “Druid” and share the lessons learned using Druid.
  • The first title of the above link is “Pinterest’s Analytics as a Platform on Druid (Part 1 of 3)”. It is explained with the following points.
    ○ A Short History on Switching to Druid
    ○ Architecture
    ○ Learnings on Optimizing Host Types for Mmap
    ○ Memory Optimized Host Types
    ○ IO Optimized Host Types
    ○ Future work
    ○ Acknowledgements
  • The second article is “ Pinterest’s Analytics as a Platform on Druid (Part 2 of 3) “. It is explained with the following configuration.
    ○ Learnings on Optimizing Druid for Batch Use Cases
    ○ Future work
    ○ Acknowledgements

A post looking at the role of an SRE team in adopting observability tooling. A lot of this depends, in my experience, on the reality on the ground of roles vs the titles.

  • The title is “The Role of SREs in Observability”.
  • The outline of the role that SRE plays in observability is explained in the following configuration.
    ○ A brief history of SREs and observability
    ○ SREs and the observability revolution
    ○ Observability requires expertise with disparate data sources and systems
    ○ Observability and reliability go hand-in-hand
    ○ SREs excel at incident response
    ○ Observability beyond SREs


Kubernetes Community Days UK is coming up on the 15th and 16th September. A virtual event over 2 days, with talks on supply chain security, secrets, scaling, getting started with Kubernetes and lots more.

  • As mentioned above, the introduction of “Kubernetes Community Days UK 2021” to be held on 2021/09/15–16.
  • Swag is available “UK Only” at the time of application, so if you live in another country, please be careful about your options. Registration is simple, so it only takes a few seconds.

SRE Weekly Issue #286 September 5th, 2021


Kill It With Fire

This is a review of Marianne Bellotti’s Kill It With Fire a book about modernizing legacy systems. It focuses heavily on operational concepts and “the system around the system”, with a heavy SRE influence.

Laura Nolan — ;login:

  • Like the Editor’s comment above, the author of this book review introduces the book by quoting the keyword “the system around the system” as follows:
    ○ Kill it With Fire is a useful and highly readable guide to solving these problems by leveraging the organisation — the system around the system.

Why every software engineering interview should include ops questions

Originally drafted in 2016, this blog post is even more relevant now. Beyond just the “why”, it has several ideas for interview questions to get you started.

Charity Majors

  • The following points are explained along with the content of the title.

The power of framing a problem

Tell a good story, and you can make things happen.

As SREs, we often know what needs to be done, but convincing others is a hard-won skill.

Lorin Hochstein

  • It quotes “Kill It With Fire” again. The part that it sympathizes with is excerpted, emphasizing the importance of storytelling.

Easyjet A320 tells United Boeing 787 to GO AROUND!

In this video report of a commercial aviation accident, there’s a neat discussion of resiliency toward the end. There were several other layers of protection that (probably) would have caught and prevented this incident if the A320 captain hadn’t intervened. And even though no accident occurred, there was still a “near miss” investigation.

Mentor Pilot

  • As commented by the Editor above, a 19-minute YouTube video that carefully explains the incidents of the aircraft.

The Role of SREs in Observability

Although conversation about observability often ignores SREs, SREs have a central role to play in observability success.

Quentin Rousseau — Rootly

  • Since it is covered in DEVOPS WEEKLY ISSUE # 558 above, I will skip it.

Cascading retries and the sulky applications

In a microservice architecture, having retries several levels deep can be a recipe for nastiness.

Oren Eini — RavenDB

  • As the title above and the comments in the Editor indicate, the code describes the problems that can occur with multiple levels of retry processing in a microservices architecture.

GitHub Availability Report: August 2021

This report has some detail on two major incidents experienced by GitHub last month.

Scott Sanders — GitHub

  • August edition of GitHub’s monthly “Availability Report”. It explains the events, responses, and countermeasures for the two incidents that occurred in August.
  • I recently found that the “Subscribe” setting notifies me of status updates (creates, updates, or resolves an incident) on the status page, so I set it up immediately.

KubeWeekly #276 September 10th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

How Seagate runs real-time analytics at the Edge

With global data creation predicted to hit 180 zettabytes by 2025, leading data storage provider Seagate needed to introduce greater automation at immense scale to its operations, to ensure it could keep pace with growing demand. Learn more by reading the full case study.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Governors clusters these persistent data

James Spurin, StorageOS

  • A 57-minute session that explains the following points along with the title. The material is organized in an easy-to-understand manner with a calm narrative, and it is a good watching experience.
    ○ The benefits and opportunities for significantly improving Kubernetes usage across your organisation via the use of an effective data plane.
    ○ Opportunities including multi-tenancy, high availability, compliance with encryption at rest
    ○ The ease of use with GitOps and the transition of traditional and legacy workloads, dependent on persistent data.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Kubernetes CI/CD pipelines: What, why, and how

Alex Chalkias, Ubuntu

  • It provides useful information on how to set up a Kubernetes CI/CD workflow using state-of-the-art open source DevOps tools for the following readers:
    ○ A developer at the start of your journey with enterprise software
    ○ An experienced software engineer working on your company’s applications, or
    ○ An engineering lead trying to improve your team’s productivity

Prometheus definitive guide part III — Prometheus Operator

Ninad Desai, InfraCloud Technologies

  • It focuses on how to easily install and manage Prometheus on a Kubernetes cluster using Prometheus Operator and Helm, and provides a detailed explanation with code/diagrams/Web UI /terminal screens.
  • If you are new to Prometheus, it highly recommends reading the first two parts of this “Prometheus Definitive Guide” series.

Sqlcommenter merges with OpenTelemetry

Nimesh Bhagat, Google Cloud

  • Announcing and introducing the integration of the open source ORM (object-relational mapping) auto-instrumentation library “Sqlcommenter” into the open source observability framework “OpenTelemetry”. It gives comments from partner companies (Datadog / Dynatrace / Splunk) and an example of how Cloud SQL Insights uses Sqlcommenter to simplify observability for developers.

Gracefully handling Kubernetes API deprecations: The tale of two ingresses

Lucas Roesler, Let’s count

  • It explains how OpenFaaS uses the Kubernetes Discovery API to provide backward compatibility with Ingress on all Kubernetes versions.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Ingress-nginx, with Alejandro de Brito Fontes and Ricardo Katz

Craig Box, Kubernetes Podcast from Google

How Docker broke in half

Scott Carey, InfoWorld

  • Based on interviews with more than a dozen former and current Docker employees, open source contributors, customers, and industry analysts, It explains how Docker broke into pieces, eventually leading to the sale of its enterprise business to Mirantis.
  • It’s interesting to hear stories that I didn’t know about Docker. For example:
    ○ Craig McLuckie, Kubernetes cofounder and now vice president at VMware, says he offered to donate Kubernetes to Docker, but the two sides couldn’t come to an agreement.

Infrastructure management going extinct with serverless

Jakub Lewkowicz, SD Times

  • The following points are explained according to the author’s idea in the title.
    ○ Abstracting away Kubernetes
    ○ Vendors are defining serverless
    ○ All eyes are on serverless at the edge
    ○ Serverless is the architecture for volatility
    ○A serverless future: A tale of two companies

Service Mesh 102: Envoy configuration

Scott Lowe, Kong

Upcoming CNCF Online Programs

*edited as the Kubernetes 1.22 release webinar has been rescheduled

Live Webinar

Cloud Native Live


Looking for more great curated content? Visit our Online Programs playlist on YouTube.

Learn more about CNCF Online Programs

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara



Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.