SRE / DevOps / Kubernetes Weekly Collection#6(Week 11)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #480 March 8th, 2020
SRE Weekly Issue #210 March 8th, 2020
KubeWeekly #207: March 13th, 2020

DEVOPS WEEKLY ISSUE #480 March 8th, 2020

A post on optimising a CI build pipeline based on Bazel, looking at the challenge introduced by immutable infrastructure and the benefits of persistent caching.

  • The title is “Bazel Performance in a CI Environment”.
  • The story of Bazel performance improvement in GitLab CI environment on AWS instances.
  • The build time has been reduced from 20 minutes to 1–2 minutes, and the cache can be used properly.

A useful presentation on what we mean by platform, and how platforms and platform teams can reduce the cognitive burden on development teams.

  • The title is “Kubernetes Is Not Your Platform, It’s Just the Foundation @ QCon London, March 2020”.
  • Presentation material of QCon London held from 3/2 to 3/6.
  • It discusses it with “Getting Started with team-centric Kubernetes adoption” as a keyword.

An interesting set of posts on alternatives to writing Kubernetes configuration in YAML files. The same simple example in JSON, C#, F#, Terraform, Java and Python so far.

  • A collection of blogs that suggests how to write Kubernetes configs using the above programming languages ​​and tools instead of YAML.
  • Links are provided for each language, so please use the language you are interested in. C# and F# are the same page in the group of .NET.

Part of a series on building more operable software, this post looks at trailing and leading indicators of operability.

  • The title is “Build Operability In — Measures”.
  • This article is Part 2 of a “Build Operability In”-themed series.
  • Architecture/Telemetry/Operational Preparation/Building, Execution/Learning and Part 7 follow.
  • Quoting from Douglas Hubbard’s “How to measure Anything”, “Organizations have measurement reversals and spend time measuring low informative variables,” this is certainly IT trust. It explains that you need to measure with an effective index, so it also includes a theme of “operability” as it applies to gender.

A neat visualisation of some of the things that someone wanting to move into system administration would likely want to learn. It does however focus squarely on the tools rather than wider devops issues.

  • The title is “Dev Ops Roadmap”.
  • A step-by-step guide diagram as a DevOps or other operational role.
  • For example, multiple languages, products, protocols, etc. may be represented.
  • The last point is “Keep Learning”. And the road continues.

A two part migration story, moving from EC2 on AWS, to Kubernetes on Google Cloud. Details of data, databases, moving from AWS ALB to Istio and more.

Part 1
Part 2

  • The story of the migration from AWS to GCP, which was taken up on last week’s article (Kube Weekly #206: March 6th, 2020).

Many organisations are just starting to adopt more automated approaches. This post highlights a few areas centered around test automation which might make doing so more difficult.

  • The title is “5 ways to drive your automation engineers away.”
  • A talented engineer with development and testing skills that can perform “test automation,” which is an essential element of the CI/CD pipeline, is too short to be hired even after several months.
  • It’s a good idea to have as many good automation engineers as possible, so I’ll warn them not to step on the same path by showing the top five factors that leave automation companies.

Iter8 is a toolset of analytics-driven canary releases and A/B testing, build atop Kubernetes and Istio.

  • Page of iter8, a tool for automated canary release and A/B test on Kubernetes and Istio for cloud native development. A demo video of about 1 minute is easy to understand.
  • Click here for the GitHub page .

Another tool for writing Kubernetes configuration in anything-but-YAML. cdk8s extends the AWS CDK tool to add Kubeernetes support, allowing you to write your configuration in TypeScript.

  • GitHub page of “cdk8s”, a tool to write Kubernetes configs using programming languages ​​and tools other than YAML.
  • Extends AWS CDK tools, supports Kubernetes, and can now write configs in Typescrypt or Python.
  • Still an experimental project. It seems that the challenge and feedback are being praised.

StatusBay is a new dashboard for Kubernetes, focused on deployment. It subscribes to the various events occurring in your cluster to present a nice real-time view. It also supports multiple clusters.

  • The “Status Bay”’s GitHub page that gives you visibility into the Kubernetes deployment process.

SRE Weekly Issue #210 March 8th, 2020

Introducing Dispatch

Netflix open sourced their incident management system.

Put simply, Dispatch is:

All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

Kevin Glisson, Marc Vilanova, Forest Monsen — Netflix

  • Introduction of OSS and contents of Netflix’s in-house tool “Dispatch” taken up last week’s article (DEVOPS WEEKLY ISSUE #479 March 1st, 2020).

Reading /proc/pid/cmdline can hang forever

I wasn’t aware of this little pitfall of memory cgroups.


  • The response of “fork() can fail: this is important” in the last week’s article (SRE Weekly Issue #209 March 2nd, 2020) was bigger than she thought, so she wrote about other unexpected failures.
  • About the possibility and mechanism that keeps hanging on reading /proc/pid/cmdline.

In space, no one can hear you kernel panic

Your failover DB instance is cute. Try 4x+ redundancy. That’s the kind of engineering required when designing systems to operate in space.

Glenn Fleishman — Increment

  • The past and future of redundancy design in NASA’s space and planet exploration missions.

A single person on-call “rotation” is a critical vulnerability

This post enumerates some of the risks introduced when a single person carries 100% of the on-call duties of a team, and shows why those risks are not simply eliminated by increasing the number of people in the rotation.

Daniel Condomitti — FireHydrant

  • It discusses the vulnerability of on-call single-person rotations to critical implications for the health of both recent incident response and long-term growth sustainability.
  • One on call person in charge, each risk of one on duty rotation, “bystander effect” when alerts fly as a group etc. There are points to consider when operating a system that needs care on 24/365 , It is summarized in a simple and easy-to-understand manner, including examples of the company’s three-person system.

Experimental study on the effect of procedure under unexpected situations

This is a pretty nifty experiment showing the importance of letting folks use their judgement to handle unexpected situations rather than relying on adherence to procedures.

Thai Wood — Resilience Roundup (summary)

Makoto Takahashi, Daisuke Karikawa, Genta Sawasato and Yoshitaka Hoshii — Tohoku University (original paper)

  • Makoto Takahashi, Daisuke Karikawa, Genta Sawasato, Yoshitaka, who gave a presentation at the REA symposium last year as the 69th issue of the “Resilience Roundup”, which writes articles on the theme of Resilience on the Internet every week, is presented. The research of Hoshii is taken up.
  • If you would like to participate in the discussion of this group, you can register here .

Coronavirus/COVID-19 and USENIX Conferences

FYI: SRECon Americas West has been rescheduled to June 2–4.

  • COVID-19 (new coronavirus infectious disease) caused 5 events sponsored by USENIX(at that moment). The event name, date after change, and place are as follows.
  1. SREcon20 Americas West: June 2–4 at the Hyatt Regency Santa Clara and the Santa Clara Convention Center in Santa Clara, CA, USA
  2. 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge ‘20): July 14 at the Sheraton Boston in Boston, MA, USA, now co-located with USENIX ATC ’20
  3. 2020 USENIX Conference on Operational Machine Learning (OpML ‘20): July 30 at the Hyatt Regency Santa Clara in Santa Clara, CA, USA
  4. SREcon20 Asia/Pacific: September 7–9 at the Sheraton Grand Sydney Hyde Park in Sydney, Australia
  5. 2020 USENIX Conference on Privacy Engineering Practice and Respect (PEPR ‘20): October 15–16 at the Hyatt Regency Santa Clara in Santa Clara, CA, USA

Millions of tiny databases

This week, we have another summary of the Physalia paper. I especially like the bit about poison pills.

Adrian Colyer — The Morning Paper (summary)
Brooker et al. — NSDI’20 (original paper)

  • A series of random surveys of CS surveys by Adrian Colyer.
  • From Marc Brooker, Tao Chen and Fan Ping of AWS announced at NSDI ’20 (SANTA CLARA, CA from February 25 to February 27) hosted by USENIX .
  • There are slides , PDFs, and other materials, so you can choose which one you want to see by drawing and those who want to see by text, or you can check both.
  • He is really deeply fond of the processes and engineering practices behind the design of Physalia, which stores configuration information as the EBS control plane for AWS.
  • DB influence area = DB is divided and managed as a myriad of small cells in order to reduce the blast radius (Blast radius), and the client stored in each cell is designed not to be affected by the failure of another cell. Is taking
  • When it comes time to deal with DB not only AWS, I definitely want to read back around here.

How did software get so reliable without proof?

In this case, “proof” means “formal proof”.

It’s not that software got so reliable without proof: it’s that systems that include software got so reliable without proof.

Lorin Hochstein

  • Quoting “How did software get so reliable without proof?” written by Turing Prize-winning scientist CAR Hoare in 1996, I sympathize with many explanations and points, but Hoare’s question remains in software Argues that it is a wrong question to look at from a larger perspective of the overall system, including software.

KubeWeekly #207: March 13th, 2020

Editor’s pick of the highlights from the past week.

Redefining extensibility in proxies — introducing WebAssembly to Envoy and Istio

Craig Box, Mandar Jog, John Plevyak, Louis Ryan, Piotr Sikora (Google), Yuval Kohavi, Scott Weiss (

Google has added dynamic extensibility to Envoy using WebAssembly, and developed an ABI called Proxy-Wasm to ensure that extensions compiled for one version can be used in another. This ABI can be adopted by other proxies, allowing Wasm extensions, initially written for Envoy, to work anywhere. The first use of this extensibility is in the new, lower-latency Istio telemetry system. An SDK (in three languages, with more to come) and an extension hub, built by, rounds out the release.

Weekly recap of CNCF member and project webinars that you might have missed.

CNCF Member Webinar: Immutable Infrastructure in the Age of Kubernetes

Timothy Gerla, CEO @Talos Systems

  • Talos Systems CEO Timothy Gerla’s webinar video entitled Immutable Infrastructure in the Kubernetes era.
  • Previously, he was the co-founder of Ansible and was the CTO when it was acquired from Red Hat.
  • I’m worried that the Talos Systems logo is different in color from the Anthos logo.

CNCF Member Webinar: Kubernetes Security Best Practices for DevOps

Connor Gorman, Principal Engineer @StackRox

  • StackRox Principal Engineer Connor Gorman ‘s Webinar video of Kubernetes Security Best Practices in DevOps.
  • He proceeded with Q&A with the moderators. It also explained how to utilize RBAC, Namespace, Network Policy, etc. with demonstrations.

CNCF Member Webinar: Use Open Source, Bare Metal, & 5G to Achieve Autonomous Drone Delivery!

Cody Hill, Field CTO @Packet

  • A webinar video about using OSS, bare metal, and 5G to achieve autonomous drone delivery by Packetar’s Field CTO, Cody Hill.
  • Tools such as Kubernetes, Emitter , OpenFaaS , Prometheus, Grafana, PostgreSQL, Mapbox and Metabase are introduced and demonstrated.
  • For the first time, I saw the block diagram including the drone. The world view that changes with technology is interesting.

CNCF Project Webinar: What’s New in Linkerd 2.7

Oliver Gould, Lead Creator of Linkerd and CTO @Buoyant

  • Oliver Gould, CTO of Buoyant and Lead Creator of Linkerd, introduced Linkerd and explained the update information of version 2.7 and the roadmap of the future Webinar video.

Tutorials, tools, and more that take you on a deep dive into the code.

What is a Service Mesh?

Mohamed Ahmed, Magalix

  • The background behind the need for a service mesh is explained from the context of microservices, and the typical usage of Envoy and hands-on using Envoy are explained.
  • It is interesting to compare the hand-drawn image diagram with the editor’s colorful diagram.

Managing service meshes with Meshery

Peter Jausovec, Learn cloud-native

  • Article for Meshery , the management plane for multi-service meshes. It provides lifecycle, configuration and performance management for service meshes and apps running on them.
  • It is also expected to be used as a vendor and project neutral tool as a tool to benchmark the performance of different service meshes.

The Complete Guide to Kubernetes Monitoring


  • The subtitle is “Learn what Kubernetes metrics to monitor, how to do it & what are the best open-source and commercial tools to help ensure peak performance of your cluster.”
  • This article introduces the importance of monitoring Kubernetes, the critical metrics to follow, and monitoring tools that make your job easier.

statusbay: Kubernetes deployment visibility

Kubernetes deployment visibility like a pro.

  • I skipped it because it was taken up in DEVOPS WEEKLY ISSUE #480 March 8th, 2020 above.

What makes a good Operator?

Daniel Messer & Chris Short, Red Hat

  • The term Operator was coined by the CoreOS team in 2016 and touched on the trend that has become rapidly popular in the last two years, introducing the best practices issued by the Operator Framework Community and pointing out the points.

Connecting AWS managed services to your Argo CD pipeline with open source Crossplane

Adrian Cockcroft, AWS

  • Hands-on introduction to OSS Crossplane and pipeline creation for Argo CD and AWS managed services. Crossplane supports AWS, GCP and Azure.
  • It explains the background of the introduction of tools such as GitOps, Argo CD , and flux CD as the features and complexity of cloud-native infrastructure increase.

NTP in a Kubernetes cluster

Balkrishna Pandey, goGlides

  • The author saw a blog that sets NTP (Network Time Protocol) called “ RUNNING NTP IN A CONTAINER “ on Docker, so he tried using OpenNTPD on Kubernetes cluster.

Articles, announcements, and more that give you a high-level overview of challenges and features.

Docker announces new roadmap for developer experience: “Helping You and Your Development Team Build and Ship Faster”

Justin Graham, DockHelping You and Your Development Team Build and Ship Fasterer

  • Justin Graham, Vice President of Products from Docker, has a blog on his site.
  • Touching on the complexity of the development environment, Docker will establish a project in the community centered on Docker Hub, Docker Desktop, OSS, contribute to other projects, and explain their first roadmap ever.

Announcing Istio 1.5

Istio Team

  • Release information for Istio 1.5. The control plane of Istio becomes simple by consolidating it into a single binary Istiod.
  • Since it is also taken up in The Headlines, other details are omitted.
  • To understand the background and points, I highly recommend the YouTube video of GCPUG Istio 1.5 Day distributed on 3/12 (Thursday).

VMware Tanzu now Generally Available

Ray O’Farrell, VMware

  • VMware Tanzu announced at VMworld US 6 months ago, and it became GA.
  • Rather than “using Kubernetes in a VMware environment”, I see it as one of the moves to bring declarative management as a platform to the entire infrastructure.

gRPC, with Richard Belleville

Adam Glick and Craig Box, Kubernetes Podcast from Google

How does Monzo keep 1,600 microservices spinning? Go, clean code, and a strong team

Tim Anderson, The Register

  • A commentary on what Monzo announced at QCon London, which was held from March 2 to March 6.

Open Policy Agent’s Mission to Secure the Cloud

Jevon MacDonald, manifold via The New Stack

  • Introducing the New Stack of OSS Open Policy Agent (OPA) that controls policies.
  • This is a presentation video and commentary of OPA Summit 2019 by software engineer William Fu of Pinterest and Luke Massa of TripAdvisor.
  • It introduces that security vendors are interested in OPA and are expected to unify disparate policy control among different systems.

How Visa built its own container security solution

Lucian Constantin, CSO

  • A story about Visa developing in-house container security using OSS while moving from a legacy monolithic application to a microservice application.
  • I couldn’t solve my problem with the vendor product, so I decided to make it in-house.
  • Once I decided to make it in-house, the operation, development, and security teams got closer together, and they became more supportive of each other, and headed for DevSecOps.

Containers and Kubernetes: 3 transformational success stories

Bob Violino, CIO

  • Three case studies (Expedia Group, Primerica, Clemson University) that have successfully migrated into the cloud with containers and Kubernetes.

Top Six Open Source Tools for Monitoring Kubernetes and Docker

Ran Ribenzaft,

  • Introducing 6 OSSs (Prometheus, Grafana, Elastic Stack, Sensu Go, Sysdig Inspect, Jaeger) as tools to monitor and analyze Kubernetes and containers. Personally, Sensu GO was unmarked.
  • Since the author has published a comparison table, you can consider the tool according to your application. It is a mystery that the Epsagon that I have not touched on in the text is at the right end as the seventh and it says “All good!”

Essential things to know about container networking

John Edwards, NetworkWorld

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Calico networking with eBPF
Chris Hoge, Developer Advocate @Tigera
Shaun Crampton, Principal Engineer @Tigera
Member webinar
March 17, 2020 10:00 AM Pacific Time

Democratizing analytics with cloud native data warehouses on Kubernetes
Robert Hodges, CEO @Altinity
Vladislav Klimenko, Senior Software Engineer @Altinity
Member webinar
March 18, 2020 10:00 AM Pacific Time

Small Is Not Always Beautiful — Moving Enterprise Applications to the Cloud
Paul Jenkins, Product Manager @Oracle Cloud Infrastructure (OCI) Cloud Native Services
Tony Vertenten, co-founder and CTO @Intris
Member webinar
March 19, 2020 9:00 AM Pacific Time

How to migrate a MySQL Database to Vitess
Liz van Dijk, @PlanetScale
Project webinar
March 20, 2020 10:00 AM Pacific Time

Argo CD, Flux CD and the GitOps Revolution
Jay Pipes Principal, Open Source Engineer @Amazon Web Services
Member webinar
March 24, 2020 10:00 AM Pacific Time

Lowering the Barrier to Kubernetes Proficiency — Navigating the Stormy Seas of Information
Chris Black, Sr. Solutions Engineer @CircleCI
Member webinar
March 25, 2020 10:00 AM Pacific Time

Continuous profiling Go application running in Kubernetes
Gianluca Arbezzano, Site reliability engineer @InfluxData
Ambassador webinar
March 27, 2020 10:00 AM Pacific Time

Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
April 3, 2020 10:00 AM Pacific Time

Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Member webinar
April 8, 2020 10:00 AM Pacific Time

Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time

Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store