SRE / DevOps / Kubernetes Weekly Collection#37(Week 42)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #511 October 11th, 2020
SRE Weekly Issue #239 October 11th, 2020
KubeWeekly #237 October 16th, 2020

DEVOPS WEEKLY ISSUE #511 October 11th, 2020


Designing on-call often lands on managers, so understanding the difference between good and bad on-call is critically important if you want to be a good engineering manager. This post is a great introduction.

A look at applying approaches from domain driven design and team topologies to identify improvements in how teams build reliable co-operating systems.

  • The title is “Using Team Topologies to discover and improve reliability qualities”.
  • At the beginning, the outline is given in TL; DR below, and the concrete explanation is given after this.
    ○ TL;DR Combining Team Topologies from the DevOps movement, with Context Mapping from the Domain-Driven Design community, can give insights about the potential friction contact points between software engineering teams. Below you can find how it can be combined and how to generate ideas to drive your organisations to new performance levels, creating a safe and healthy working environment.

Five tips for implementing observability, looking at black box monitoring, service metrics, tracing and more.

  • The title is “5 tips on implementing Observability”.
  • It quotes “hierarchy of needs to make your product reliable” from the SRE book. The reasoning that the one that supports the pyramid is “Monitoring” and the one that supports Monitoring is “Observability”, and it touches on the importance of Observability with the following five tips are explained.
    Tip 1. Productionize your programming languages
    Tip 2. Alert on most important service metrics
    Tip 3. Add some blackbox monitoring into the mix
    Tip 4. Learn querying your metric database
    Tip 5. Invest in tracing

An interesting new Lambda feature, extensions open up lots of opportunities for monitoring and security use cases that have been hard to implement up to now.

  • It describes what the newly released “Lambda Extensions” is from AWS, the problems they solve, their various use cases, and how to work with them.
  • I am also curious about suggestions from other vendors such as Datadog.

Anyone who has worked with logs will be familiar with the concept of log levels. This post has a bit of history and discusses how log levels are commonly used.

  • The title is “Understanding Logging Levels: What They Are & How To Use Them”.
  • While unraveling from the history of log levels, it explains what the log level is, how to use it, and the importance of setting the correct log level.

Terraform is building up an ecosystem of developer tools around it, but which to try first? This video playlist has several tool reviews which might be of interest.

  • It is the link to the YouTube video playlist “Terraform tools reviews”. It introduces some tools from the Terraform ecosystem (tflint, checkov, etc.).

Many large organisations end up using multiple cloud providers, whether by inertia or design. This post covers some common issues and misconceptions.

  • The title is “The Biggest Myths of Multi-Cloud”.
  • The author said, “Cloud native has become the standard, but the IT industry is still in a state of confusion when it comes to multi-cloud.” and explained a few of the most common myths surrounding multi-cloud they compiled.
  1. Any organization that has containerized its applications is cloud native by default and therefore prepared to multi-cloud.
  2. A multi-cloud approach is an active one that requires load-balancing traffic across different clouds.
  3. Multi-cloud solves vendor lock-in.
  4. Multi-cloud isn’t fully portable.
  5. Containers are simple, and it’s therefore totally fine to create complexity wherever convenient.
  6. Most organizations aren’t already multi-cloud.
  7. Multi-cloud is less secure.
  8. Only large enterprises benefit from multi-cloud.
  9. Multi-cloud is the same thing as hybrid-cloud.
  10. Multi-cloud is always best suited to open source.


Troubleshoot is a set of tools for supporting applications deployed to Kubernetes. Preflight provides pre-installation cluster conformance testing and validation and support-bundle provides post-installation troubleshooting and diagnostics.

  • The GitHub page of “Replicated Troubleshoot”, the framework for collecting, editing, and analyzing highly customizable diagnostic information about Kubernetes clusters.

SRE Weekly Issue #239 October 11th, 2020


Respect your natural scaling limits

Don’t scale up farther than you need to! If you won’t ever see more than 100 RPS, don’t architect for 100,000.

Let go of Rahien

  • In line with the theme, the situation of “ providing a platform to set up parties for newborns in a particular country” is used as an analogy, and while giving numbers, the load and processing are familiar to us. He explains that a simple architecture can actually provide value, reducing time to market.

The Many Shapes of Site Reliability Engineering

This one covers several common patterns of SRE practice and then offers insight on what to look for as you design your own SRE team.

Rob Cummings — Slalom Build

  • The author, who has been in talks with “a wide range of organizations, from smaller mid-market companies all the way to astoundingly large and complex enterprises, all from an equally wide range of industries” explains SRE with the following in mind.
    ○ Please note- This post is not about gatekeeping and declaring there is only one true way to approach SRE.
  • Below the three major items, each item and point are decomposed and explained.
  1. SRE Implementation Patterns
  2. Consider This
  3. How to move to SRE?

Abstractions and implicit preconditions

Abstractions make us more productive, and, indeed, we humans can’t build complex systems without them. But we need to be able to peel away the abstraction layers when things go wrong, so we can discover the implicit precondition that’s been violated.

Lorin Hochstein

  • Quoting the following phrase from the author’s favorite essay of Joel Spolsky, “The Law of Leaky Abstractions”, it explains that it is necessary to peel away the abstraction layers when things go wrong, so we can discover the implicit precondition that has been violated.Because one of the challenges with abstractions is that they depend upon preconditions.
    ○ All non-trivial abstractions, to some degree, are leaky.

Keeping CALM: When Distributed Consistency Is Easy

Coordination between nodes in a distributed system can kill performance. What kinds of problems require coordination? The CALM theorem can tell us.

Joseph M. Hellerstein and Peter Alvaro — Communications of the ACM

  • The following issues and points are explained about the distributed system, which nearly all of the software we use today are included as a part of it.
  • This issue should concern us because nearly all of the software we use today is part of a distributed system.
    ○ The high cost of coordination.
    ○ The bigger picture: Program consistency.
    ○ Distributed deadlock detection.
    ○ Distributed garbage collection.
    ○ The crux of consistency: Monotonicity.
    ○ Program consistency: Confluence.
    ○ Confluent shopping carts.
    ○ A sketch of the proof.
    ○ CAP and CALM: Going positive.
    ○ Distributed design patterns.
    ○ The Bloom programming language.
    ○ Coordination in its place.
  • Words that I don’t usually come in use with, such as the CALM theorem, CAP theorem, and CRDTs (Conflict-free replicated data types), made me insecure. If the translation technology wasn’t well developed, I would have decided to put it off in seconds. Now that Google can easily compare the difference between the translation target and the translation on the left and right, I managed to read it. I have to dig deep one by one.

The Ultimate, Free Incident Retrospective Template

Here’s another good post-incident analysis document template that you can use as inspiration for your own.

Hannah Culver — Blameless

  • It touches on the current situation, “many teams find themselves unable to complete incident retrospectives on a regular basis” “One common reason for this is that day-to-day tasks such as fixing bugs, managing fire drills, and deploying new features take precedence, making it hard to invest in a process to streamline post-incident report completion. ” and provide an example of what a comprehensive, narrative incident retrospective could look like.
  • I agree with the author’s analysis and task setting that “Teams need a solid post-incident template that can help minimize cognitive load during the analysis process.”.

4 Signs Software Reliability Should be Your Top Priority

As your product ages, it transitions from “cool new thing” to “tool everyone uses and expects to Just Work”. Your reliability needs will change accordingly.

Lyon Wong — Blameless

  • “In a software company’s lifetime, there comes a moment when it’s mission critical to shift priority from shipping new features to protecting the reliability of all features shipped so far.” the authors said. Here are the four signs that they have examined and identified. (Is it a typographical error? The text is five in one place)
  1. Your Product is Becoming a Utility
  2. Your Users are Demanding Reliability over New Features
  3. New Contracts have Tighter SLAs (B2B) / Customers are Getting Less Patient (B2C)
  4. Spaghetti Code is Now Easier to Refactor than to Fix


KubeWeekly #237 October 16th

The Headlines

Editor’s pick of the highlights from the past week.

CNCF Cloud Native Survey China

CNCF regularly surveys the community to better understand the adoption of open source and cloud native technologies. For the third time, CNCF conducted the Cloud Native Survey China in Mandarin to gain deeper insights into the pace of cloud native adoption in China, and how that’s empowering developers and transforming development in this large and growing community. A key stat of the report is that Kubernetes is used in production by 72% of the respondents. Dive into the report for more details and key stats.

  • It is based on the following two 2018 reports surveyed in Mandarin in China.
    ○ This report builds on the first two China reports, published in March 2018 and November 2018.
  • In the original text, the same link as the above and the title was pasted in “Dive into the report”, but it was confusing, so I canceled it.
  • China is the third largest country that holds country-based contributors and committees after the United States and Germany. There are nearly 50 members of the CNCF. There are several case studies of major companies, so it is expected that their presence will increase.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: You can be a Kubernetes contributor too!

Jeremy L. Morris, Software Engineer @DigitalOcean

  • It addresses the curiosity and fear that many people face when trying as a new contributor, and gives an example of a realistic path for beginners to become Kubernetes contributors. It is easy to imagine by explaining while showing the specific procedure and screen display.

CNCF Member webinar: GitOps at scale for a multi-cloud, multi-region stateful application

Rick Spencer, Head of Platform @InfluxData

  • Based on the company case of InfluxData, which runs a database as a Kubernetes service in multiple regions of multiple clouds with GitOps, the following points are explained.
    ○ What is “GitOps”, why the team adopted it, and what benefits they accrued.
    ○ Requirements and preconditions necessary for successful gitops.
    ○ Real-world examples of the metrics and observability used to monitor both deployments and the running application.
    ○ Lessons learned along the way.

CNCF Member webinar: A full application environment for every PR–before you merge to master!

Vishal Biyani, CTO @InfraCloud & Jono Spiro, Staff Software Engineer of Engineering Operations @OpenGov

  • OpenGov and InfraCloud have introduced the “” system. It provides an ephemeral environment where a large team of developers can conduct seamless tests.
  • The demo uses steroids as an ephemeral environment to build the functionality of Skaffold, Helm, and Github Actions. I saw BotKube after a long time.

CNCF Member webinar: S&P experience report: multi-cloud serverless on Knative

Evan Anderson, Software Engineer @VMware and Mark Wang, Head of Cloud Engineering @S&P Global Ratings

  • It explains how to use Knative to build a serverless developer experience.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

The infrastructure triumvirate: Continuous service and infrastructure delivery with Argo, Kustomize, and Google Config Connector

Shray Kumar, Bluecore

  • It explains how to use the three tools (ArgoCD, Kustomize, ConfigConnector) and how to solve your company’s technical problems with this configuration, as if it were a triumvirate.

Step by step: Datastax Cassandra with Istio and SNI routing

Christian Posta,

  • It takes a detailed look at how to get a deployment of Datastax Cassandra running on Kubernetes using TLS and SNI via Istio and how to route it from outside the cluster.
  • The components themselves are very complex, so take a step-by-step approach to this guide. It builds the architecture and spends space as needed to explain the approach and benefits. There are many considerations and trade-offs, so if you have any questions, please contact them.
    ○ As we are following a specific path in this blog, and there are many considerations and tradeoffs at each step, please do reach out to us if you have questions or need help.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Announcing Google Cloud buildpacks — container images made easy

Matthew Soldo and Steren Giannini, Google Cloud

  • It introduces GCP managed buildpacks.
  • The embedded video explains “What are buildpacks?”, “Why buildpacks are used?”, and “How to use buildpacks on GCP?” with demos. I haven not used GCP so much lately, but I will take a quick time and try hands-on.

Okteto, with Ramiro Berrelleza

Craig Box and Adam Glick, Kubernetes Podcast from Google

The first authoritative guide on Harbor is now available in Chinese

Harbor maintainers

  • It informs that the management guide book of Harbor has been published in Chinese. Harbor is the first project developed in China to enter the CNCF and achieve graduation.
  • As you can see from the above report from China, the adoption of open source and cloud native technologies is progressing and the number of contributors is increasing.
  • I hope that there will be more opportunities for each engineer and organization to contribute to communities and receive benefit from them globally.

Minecraft as a K8s admin tool

Eric Jadi

  • It is said that Kubernetes will be managed by “Minecraft”. “DOOM”, which I covered last week, also had an impact, but you came up with an interesting idea.
  • If a life simulation video game is made into VR, I may play it endless.

Minimum Viable Changes coming for KubeCon + CloudNativeCon North America — Virtual!

Priyanka Sharma, CNCF

  • CNCF GM Priyanka Sharma has announced that they took the feedback for the last virtual event and are committing to some Minimum Viable Changes (MVCs) to the event management of KubeCon + CloudNativeCon North America — Virtual as CNCF. It is a concept advocated by GitLab, a previous employer of hers.
  • It has listed four specific initiatives.
    ○ An easy to use schedule builder.
    ○ Everybody gets swag or the option to donate! If you’re registered for the Full Access pass, you’ll have the choice to receive a KubeCon + CloudNativeCon North America Swag Box or have us donate to charity!
    ○ We’ve adjusted the length of the event.
    ○ The 101 Track is back — and even bigger!

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: How to migrate NF or VNF to CNF without vendor lock-in
Grzegorz Sikora, VP Business Development @OVOO
Oct 20, 2020 10:00 AM Pacific Time

Member Webinar: The ABCs of Kubernetes security
Roger Klorese, Senior Product Manager @SUSE
Danny Sauer, Senior Software Engineer @SUSE*
Oct 21, 2020 7:00 AM Pacific Time**

Member Webinar: Deploying Kubernetes to bare metal using cluster API
Seán McCord, Principal Senior Software Engineer @Talos Systems, Inc.
Oct 21, 2020 1:00 PM Pacific Time

Member Webinar: K8s audit logging deep dive
Randy Abernethy, Managing Partner @RX-M
Oct 22, 2020 10:00 AM Pacific Time

Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer
Francisco Perez, Senior Backend Engineer
Oct 23, 2020 10:00 AM Pacific Time

Member Webinar: Admission controllers: one part of your Kubernetes security and governance toolkit
Gunjan Patelm, Cloud Architect @Palo Alto Networks
Robert Haynes, Cloud Security Evangelist @Palo Alto Networks
Oct 28, 2020 7:00 AM Pacific Time

Member Webinar: Event-driven architecture with Knative events
Nicolás López, Senior Software Engineer @Google
Bryan Zimmerman, Product Manager @Google
Oct 29, 2020 10:00 AM Pacific Time

Member Webinar: Security in the world of service meshes
John A. Joyce, Principal Engineer @Cisco
Nov 4, 2020 1:00 PM Pacific Time

Member Webinar: Building edge as a service
Dr. Bin Ni, CTO @Wangsu Science & Technology / CDNetworks
Nov 5, 2020 10:00 AM Pacific Time

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time

Member Webinar: MicroK8s HA under the hood: Kubernetes with Dqlite
Konstantinos Tsakalozos, Senior Software Engineer @Canonica
l Nov 11, 2020 7:00 AM Pacific Time

Member Webinar: The what and why of distributed tracing
Dave McAllister, Sr. Technical Evangelist @Splunk
Nov 13, 2020 10:00 AM Pacific Time

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store