SRE / DevOps / Kubernetes Weekly Collection#4(Week 09)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #478 February 23rd, 2020
SRE Weekly Issue #208 February 23rd, 2020
KubeWeekly #205: February 28th, 2020

DEVOPS WEEKLY ISSUE #478 February 23rd, 2020

An in-depth look at the recent report into the recent UK bank TSB IT migration failure. Lots of details and some great anecdotes for any enterprise IT or project management folks to learn from.

  • The title is “Lessons from the TSB failure: a perfect storm of waterfall failures”.
  • In this article, It checked walk through some of the key points in the executive summary of the independent report, carried out by Slaughter and May, into the TSB migration failure which is likely to have prompted this outsourcing.

This post describes the role game days, and practice in general, play in improving incident management processes.

  • The title is “Got Game? Secrets of Great Incident Management”.
  • On-call came in at 2 o’clock in the morning of one day.
  • The author handled the outage in the following line; Escalation -> Convene the trouble response team -> Division of roles -> Investigate the cause -> Solve the problem-> Close the case
  • Incident closed at 2.16. Expected to go back to bed and they would see in the morning, but it was mid-afternoon. So, it was an exercise.

Devops conversations often turn to how organisational
structure impacts the work we do. This post cleverly looks at
organisational structure not through the org chart, but through how people actually work and influence others. When we say we ship the org chart, we need to ask which one.

  • The title is “The Shadow Organizational Chart”.
  • It is a blog post of the website of Carta. The CEO of Carta has long felt there is a shadow org chart, much like a shadow economy, where employees trade ideas, give direction, offer help, and spread culture.
  • He wanted to map this shadow org chart and find employees who have disproportionate levels of influence relative to their hierarchical position. He also wanted to see the influence centers and decision makers, and the directional current between them and the rest of the company.
  • He is using Innovisor to create my internal (human relations) network graph.

A nice long post on building a culture of operational excellence. The importance of measurement, training and education and how tools and culture support each other.

  • The title is “Towards Operational Excellence: Part 2-On the importance of tools”.
  • The author is Adrian Hornsby of Principal Evangelist, Architecture
  • Part 2 of a series on one of AWS’s best practices, “Operational Excellence.”
  • Click here for Part 1.
  • It focuses on great tools of the three interconnected elements (culture/great tools/process) that will enable successful operation of the built technology.

With the ever-present need to manage lots of YAML files, various tools have been emerging to help. This post looks at some of the problems with text-based templating, and explores yq, kustomize and using native Javascript bindings for Kubernetes.

  • The title is “Templating YAML in Kubernetes with real code”.
  • In the article, He suggested using yq or kustomize to template YAML, instead of relying on tools that interpolate strings such as Helm.

Lots of details on how logging in Kubernetes works, from the cluster components to the applications you’re running on top.

  • The title is “The Complete Guide to Kubernetes Logging ~How is Logging in Kubernetes different, how it works, how to use it: use cases and best practices.~”’.
  • The goal of the article is to introduce the content of the title + tools to manage logs, and to allow readers to aggregate logs from their Kubernetes cluster.

An example of using Lambda to bridge two other AWS services, in this case AWS Kinesis Firehose and AWS ElasticSearch.

  • The title is “AWS Kinesis Firehose throttling with transformation Lambda”.
  • The author previously wrote an article, “Terraform AWS Kinesis Firehose + Elasticsearch module”, about a Terraform module which can be used to set up a logging pipeline with AWS Kinesis Firehose and AWS Elasticsearch.
  • This article shows how I used a Lambda transformation to coordinate the flow between AWS Kinesis Firehose and AWS Elasticsearch.

Another post on alternatives to authoring Kubernetes configuration in YAML. This presentation looks at using Kotlin and the Kotlin Kubernetes DSL for authoring configuration.

  • The title is “Kotlin Conf 2019: Unlock Power of Kotlin DSL for Kubernetes by Fedor Korotkov”.
  • YouTube video of the presentation at KotlinConf 2019.
  • The linked site “” has popular tags such as programming languages((Javascript/Python/React/GraphQL) ​​and tags for each event, so you can check the videos for developers collectively.


Gops is a handy tool for listing and diagnosing Go processes running on a machine. LIst the process, which version of Go was used to compile the binary, network connections and more.

  • A link to a repository on GitHub for a tool “gops” that lists and diagnoses currently running Go language processes.

Anatomy of Cascading Failure

There’s so much in this article:

  • how to recognize when your system may be susceptible to cascading failure
  • how to prevent it
  • how to deal with it when it happens (and how hard that can be)

Laura Nolan — Slack

  • The risk of avoiding and experiencing a detailed analysis and 6 anti-patterns of Laura Nolan, author of the SRE book Chapter 23 “Managing Critical State” and contributing to “Seeking SRE” Propose a method to reduce.
  • It’s a great read, so it’s my personal favorite for this week and bookmarked.

Catchpoint’s SRE Survey 2020 Is Here

It’s time for this year’s SRE Survey. Don’t forget that with each completed survey, Catchpoint donates $5 to charity.

This growing demand [for SREs] is not without growing pains as a skills gap problem has emerged due to the fact that SRE training requires a hands-on, interactive learning environment.

Peter Murray — Catchpoint

  • Information on SRE Survey 2020, which was implemented until February 28. It might take 20 to 25 minutes, and a $500 gift card was offered for the responders.
  • The survey results will be released on March 23. It will be in time for SRECON AMERICAS WEST to be held in Santa Clara, California, USA on 3/24–3/26.

Resilience Roundup — Above the Line, Below the Line

Both the summary and the original article are well worth reading. This stood out to me:

As much as we may think of incidents as taking place in all those technical parts of the system below the line, incidents actually take place above it

Thai Wood (summary)

Dr. Richard Cook (original article)

The Jellyfish-Inspired Database Under AWS Block Storage

The EBS control plane data store resembles a “jellyfish” (actually a Physalia, a.k.a. Portuguese man-of-war).

Timothy Prickett Morgan — The Next Platform

  • It is a proposal from the author that If you want inspiration for a hyperscale, resilient distributed block storage service, apparently a jellyfish is a good place to start looking for architectural features.
  • It seems interesting, so I will dig deeper again.

The Problem with Microservices: ‘Deep Systems’

Ideal: each team manages their microservice(s) in isolation.

Reality: microservices interact in unexpected ways and a broader system emerges that has remarkable similarities to running a monolith.

Ben Sigelman — LightStep

  • It discussed the phenomenon of “deep systems” that is newly occurring due to microservices.
  • The number of developers that can operate at the same time for a single app can be limited, and the architecture is such that the infrastructure is operated independently in four or more layers.
  • He said that developers should be given new tools to ensure observability so that they can spend their time doing their original job, such as improving the quality of software, rather than troubleshooting.

SRE for single-tiered software applications

This one discusses how to handle SRE for a monolith, and some examples of what often goes wrong.

Eric Harvieux — Google

  • The title is “Making your monolith more reliable”.
  • He touches on some of the most common problems with “monolithic architectures” treating and scaling monolith as a platform, and practicing it with SRE Principles in mind.

Trying to sneak in a sketchy .so over the weekend

The author blocked an unexpected Sunday deploy of untested code, and it turned out to be a good thing they did.


  • It took one from the history of the battle with many bad rollouts that the author has experienced.
  • The story began at about 3:30 pm on Sunday, local time, one day. A sentence from the admin’s point of view that showed her anger for the engineers in the company about the “recklessness.”
  • While the usual support was not available due to the holidays, it was interesting to depict the scenes such as trying to touch the production environment with authority and trying to move without considering reliability.

KubeWeekly #205: February 28, 2020

Editor’s pick of the highlights from the past week.

The countdown to KubeCon + CloudNativeCon Europe is on!

Day-0 co-located events are a huge part of the event. This year, CNCF is hosting three co-located events in Amsterdam on Monday, March 30, providing the opportunity for attendees to deep-dive into these technology topics. We’re excited to share that the schedules are now available for these Day 0 events. Please find the details below.

  • KubeCon + CloudNativeCon Europe was finally approaching the end of that month, but at that point (as of 3/1 dawn) there was no big change such as schedule due to the COVID-19(Later, it was rescheduled).

Schedules Announced for Cloud Native Security Day, Serverless Practitioners Summit, ServiceMeshCon

Kim McMahon, CNCF

  • Cloud Native Security Day, Serverless Practitioners Summit and ServiceMeshCon were scheduled to be held on March 30th(At that moment).

Contributor Summit Amsterdam Schedule Announced

Jeffrey Sica, Red Hat and Amanda Katona, VMware

  • Kubernetes Contributor Summit schedule announced. It would be held on March 29 and March 30(But actually, postponed too).

Tutorials, tools, and more that take you on a deep dive into the code.

New Application Manager brings GitOps to Google Kubernetes Engine

Palak Bhatia, Product Manager and Janet Kuo, Software Engineer, Google Cloud

  • Introducing the new Application Manager (beta) feature of GKE in GCP(at that moment).
  • Declarative configuration management according to GitOps principles.
  • There are demo videos and tutorials. I want to get my hands dirty with these.

Kafka disaster recovery on Kubernetes with CSI

Toader Sebastian, Banzai Cloud

  • Introducing Banzai Cloud’s Disaster Recovery function of Apache Kafka and its own product, Banzai Cloud Supertubes, that complements the missing parts.
  • Supertubes is a deployment tool that utilizes a cloud-native technology stack to set up and operate production-ready Kafka clusters on Kubernetes.
  • Supertubes includes Zookeeper, Banzai Cloud Kafka operator, Envoy, Istio and many other components to operate the above environment.

Pangolin: an experimental Kubernetes autoscaler

An enhanced Horizontal Pod Autoscaler for Kubernetes

Damian Peckett

  • Enhanced version of Kubernetes (= enhanced) Pod Horizontal Autoscaler “Pangolin” Link to GitHub page.
  • Written in Rust. I feel the author’s preference from the comment of “Why Rust?”

CNCF Tools Overview: Fluentd — Unified Logging Layer

Ran Ribenzaft, Epsagon

  • Guest article posted on the CNCF site based on the article written on the company’s site by Ran Ribenzaft of Epsagon.
  • “Logging in the good old days” when administrators accessed ssh or tail on bare metal or VM unit, and promised availability far beyond single device in container, easy VM disposal, PaaS environment after that the contrast of itself. “How do I access the logs even though I don’t know which machine the software is managing and a particular service is running?” From the manager’s point of view, I was able to visualize the differences in actual work and issues.

Weathervane 2.0: An Application-Level Performance Benchmark for Kubernetes

Harold Rosenberg, VMware

  • Introducing version 2.0 of “Weathervane”, a tool for benchmarking Kubernetes application-level performance on the VM blog.

How to Optimize I/O Intensive Containers on Kubernetes

Jay Huang, NeuVector

  • The subtitle is “Understanding the Real-time Characteristics of Linux Containers.”
  • To create a container with high I/O optimization, it is necessary to have a deep understanding of CFS (Completely Fair Scheduler).

What are Open Source Security Approaches? With Examples

Connor Craven, SDxCentral

  • Introducing 7 points and products with the advantage and security of using OSS.

Different Approaches for Building Stateful Kubernetes Applications

Janakiram MSV

  • An article in a three-month series examining “The 2020 Kubernetes challenges from The New Stack”.
  • Validate the challenges of running stateful workloads on Kubernetes. Touching Stateful Set and CSI.

Weekly recap of CNCF member and project webinars that you might have missed.

CNCF Member Webinar: Managing Observability in Modern Applications

Ran Ribenzaft, Chief Technology Officer, Epsagon

CNCF Member Webinar: Helm Security — A Look Below Deck

Matt Farina, Helm Maintainer, Samsung SDS
Hayley Denbraver, Developer Advocate, Snyk
Raghavan “Rags” Srinivas, Lead Container Developer Advocate, Snyk

  • Webinar video about Helm’s security.

CNCF Member Webinar: From Notebook to Kubeflow Pipelines with MiniKF & Kale


  • The de facto standard for running machine learning workflows on Kubernetes is Kubeflow.
  • Webinar video on the theme of seamlessly migrating scientists’ Kubeflow Pipeline that visualizes machine learning codes/experiments/results on Jupiter Notebook.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

5 predictions for Kubernetes in 2020

Scott McCarty, Technical Product Manager, Red Hat

  • An article by Red Hat’s Technical Product Manager, looking back at 2019 on New Year’s Day 2020, anticipating five things to happen in the Kubernetes ecosystem in 2020.
  • It seems to be wise to consider that there is a bias due to the sites uploaded and his position on which it is posted.

Distributions were for Linux, not for Kubernetes

  • Forbes article. Kubernetes tides, movements of each company and general movements.
  • I think it’s an article that an engineer can read quickly.

State of Container and Kubernetes Security Report, Winter 2020


  • It’s the source of the article “Security concerns hampering the adoption of containers and Kubernetes” I checked here last week.
  • I will skip this detail because the contents are already covered, but it is a report summarizing the current state of security of containers and Kubernetes. Almost all of the respondents, 94%, had a security incident in a container environment in the last 12 months.

Enterprise Kubernetes with OpenShift (Part one)

Jaafar Chraibi, Red Hat

  • The question started from the author’s thought “What’s the difference between Kubernetes and OpenShift?” is similar to the question “What’s the difference between an engine and a car?”.
  • It was easy to understand the difference between Kubernetes and OpenShift as seen from Red Hat and the current situation, and there were some parts that I thought of by looking at the numbers, “I did not have that perspective.” I reconfirmed that strategy and marketing are important.
  • It’s Part 1 of the series, so I’m looking forward to the rest.

Accelerators and GPUs at NVIDIA, with Pramod Ramarao

Craig Box and Adam Glick, Kubernetes Podcast from Google

  • The guest is Pramod Ramarao, Product Manager of NVIDIA.
  • “News of the week” includes a lot of news on KubeWeekly, including last week’s, but still the rest are many. A lot of GCP-related ones.

Q&A: Kubernetes Storage SIG Chair on the State of State in Kubernetes

Emily Omier, The New Stack

  • One article in a three-month series that examines the challenges of Kubernetes of The New Stack in 2020 as well as “Different Approaches for Building Stateful Kubernetes Applications” above.
  • They interviewed Saad Ali, a Google software engineer and chair of CNCF’s Kubernetes Storage Special Interest Group, and talked about how to manage stateful workloads with Kubernetes, the key issues, what Storage SIG is now working on and the future.

Summing Up: Container Image Building

Puja Abbassi, Giant Swarm

  • Unlike the days when Docker was the strongest in the early days, there are many tools other than Docker that build container images. As long as the built image conforms to the OCI specifications, it works, so you don’t have to worry about speculations or fragmentation.

Why Those Gaps in Kubernetes Are Really a Good Thing

Arvind Gupta, The New Stack

  • To support a variety of use cases, early Kubernetes developers gave this platform a deliberate gap to give users flexibility. In other words, it is designed so that the environment can be expanded with CRD, CSI, and CNI. This gives flexibility to both infrastructure and app layers.
  • When adopting Kubernetes in your organization, it’s important to consider infrastructure and app management that meet your overall requirements in a way that minimizes the time, effort, and cost required.

Docker Images : Part II — Details Specific To Different Languages

Jérôme Petazzoni, Ardan Labs

  • The previous Part 1 article introduced multi-stage builds, static and dynamic linking, and briefly mentioned Alpine.
  • In this Part 2 article, lt covered Go-specific details, and then Alpine. Finally, I saw how it works with other languages ​​such as Java, Node, Python, Ruby, and Rust.

On-Demand Container Scanning API

Jerry Gamblin, Kenna Security

  • Last summer, he launched to reveal the number of vulnerabilities in the 1,000 most popular docker hub containers.
  • Shortly after launching the project, several people were asked if he could scan other public containers.
  • Hw wanted to provide this feature, so he decided not to sleep for the last two weeks and built the first API to publish that day.
  • is an open Python API built using Trivy, Flask, Gunicorn, and Nginx, and currently has two public endpoints (more endpoints and tools will be provided). From the beginning, it was designed to be easy to use in a browser or CLI for integration with CI/CD.
  • It was an early vader version, so was not meant to be used in production yet.
  • He made it without sleeping, so it’s amiable to have a blur of “Notice Something Boken?”, but he’d love to hear your feedback, so he’d love anyone who is interested.

Catch the CNCF next week at SCaLE 18x

Kim McMahon, CNCF

  • CNCF would participate as a sponsor and exhibitor of the 18th annual event SCaLE 18x, which was held from 3/5 to 3/8. The location was Pasadena, California.
  • Kim McMahon would be a representative of the CNCF and looked forward to meeting community members.
  • Kubernetes socks would be distributed for free at booth #311! Volunteers for the booth by community members were also being recruited.

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Kubernetes Security Best Practices for DevOps
Frédéric Harper, Senior Developer Advocate @DigitalOcean
Member webinar
March 3, 2020 10:00 AM Pacific Time

Service Mess to #ServiceMesh
Member webinar
March 4, 2020 10:00 AM Pacific Time

What’s New in Linkerd 2.7
Linkerd team
Project webinar
March 6, 2020 10:00 AM Pacific Time

Kubernetes Security Best Practices for DevOps
Connor Gorman, Principal Engineer @StackRox
Member webinar
March 11, 2020 10:00 AM Pacific Time

Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
March 13, 2020 10:00 AM Pacific Time

How to migrate a MySQL Database to Vitess
Liz van Dijk, @PlanetScale
Project webinar
March 20, 2020 10:00 AM Pacific Time

Argo CD, Flux CD and the GitOps Revolution
Jay Pipes Principal, Open Source Engineer @Amazon Web Services
Member webinar
March 24, 2020 10:00 AM Pacific Time

Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Member webinar
April 8, 2020 10:00 AM Pacific Time

Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time

Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store