SRE / DevOps / Kubernetes Weekly Collection#61(Week 13, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #535 March 28th, 2021
SRE Weekly Issue #263 March 28th, 2021
KubeWeekly #257 April 2nd, 2021

DEVOPS WEEKLY ISSUE #535 March 28th, 2021


A great discussion on the importance of risk management and compliance in technical operations.

  • The title is “Technical Ops and Compliance — From Bootstrap to Scale with Richard Crowley.”
  • The following insights are extracted from the approximately 43-minute video interview embedded in this web page. At the beginning, It looks back on its history and talks about its experience at Slack along with the title.
    ○ Talk about risks and SLAs with your customers.
    ○ Deciding what you should buy.
    ○ From centralized to distributed operations.
    ○ Avoiding the everything bucket.

A presentation on CI/CD pipeline analytics, measuring flow and other metrics as part of software delivery.

  • The title is “Measuring DevOps Success with Pipeline Analytics”.
  • It explores pipeline analysis practices, develops strategies for measuring DevOps, and describes how to use tools such as DORA and flow metrics as KPIs for success. Join to Learn is below.
    ○ How pipeline analytics are a necessary practice in all high-performing DevOps environments
    ○ What metrics are useful and when
    ○ How to create a pipeline analytics strategy

A survey from researchers at the Technical University of Darmstadt and the University of St.Gallen on the adoption of devops practices and tooling.

  • As mentioned above, the title of the survey is “Dependencies in DevOps 2021”.

A curated collection of security resources for Kubernetes. Everything from the basics to tools, video recordings and papers.

  • The GitHub page “Awesome Kubernetes (K8s) Security” that summarizes the above contents.
  • At the same time, this kind of summary page is very grateful, and at the same time, I feel the pressure of sudden emergence of homework. And I am relieved that there are many known things by looking at the contents.

A brief introduction to the concept of service mesh, and a comparison including Linkerd, Consul Connect, Istio and Kuma.

  • The title is “A Kubernetes Service Mesh Comparison”.
  • Entering from the background of the birth of the service mesh, it is easy to see with brief explanations and comparison tables of each Kubernetes service mesh product.

A spreadsheet for modelling savings from improvements to CI/CD.

  • As mentioned above, jump to the spreadsheet “CI/CD Improvement Scenario: Sheet 1” page.

A quick introduction to some of the software supply chain security issues mitigated by sigstore.

  • The title is “Does Sigstore Really Secure The Supply Chain?”.
  • The Linux Foundation’s new project, Sigstore, which was covered in multiple articles last week, addresses the question, “Can we prevent all supply chain attacks?”

A post on understanding some of the OAuth flows that can result in authentication and authorization security issues, and planned changes to the specification to avoid them.

  • The title is “Do You Know Your OAuth Flows?”.
  • As an important point to understand in cyber security, it explains the importance of “OAuth grant type(also called OAuth flow)” used for the type of application to be built.

SRE Weekly Issue #263 March 28th, 2021


[Increment: Reliability] Tracing a path to observability

They make a really clear case for why traditional metrics and monitoring couldn’t help them solve their problems.

Mads Hartmann

  • As the title suggests, Glitch’s work on observability to increase the reliability of the production system is traced in diary format in chronological order.

Glynn Lunney — SRE Leadership

This article commemorates the death of NASA flight director Glynn Lunney by showing the SRE lessons we can learn from him.

Robert Barron

  • As mentioned above, a memorial article by Glynn Lunney. A NASA memorial video is embedded near the end of the article.

7 top Site Reliability Engineer (SRE) job interview questions

I like that this focuses on human factors.

Kevin Casey

  • After touching on the role of SRE and the required image of the person, the following seven questions are explained as the title suggests.
    ○ Question 1: How do you decide if the team should work on new features or paying down technical debt?
    ○ Question 2: How do you go about setting SLOs and SLIs and how do you make adjustments when necessary?
    ○ Question 3: Which of the three pillars of observability is most important to you? Which one do you feel you need to get more exposure in?
    ○ Question 4: How have you implemented process improvements and other changes in the past?
    ○ Question 5: How do you balance the wishes and needs of different authorities on the team?
    ○ Question 6: How does customer experience and/or employee experience inform your SRE strategy?
    ○ Question 7: How do you learn and keep up with industry trends and toolchains?

How to Scale for Reliability and Trust

Dealing with both the increased expectations and challenges of reliability as you scale is difficult. You’ll need to maintain your development velocity and build customer trust through transparency.


  • Regarding the content of the title, the following explanation is given as “You’ll need to maintain your development velocity and build customer trust through transparency. “ and “ It isn’t a problem that you can solve by throwing resources at it.”.
    ○ Design services that can remain reliable while scaling
    ○ Balance reliability and development velocity
    ○ Respond to incidents using best practices
    ○ Build trust when incidents occur through good communication

Engineering Failover Handling in Uber’s Mobile Networking Infrastructure

Uber’s customers are especially likely to be moving around and going in and out of tunnels, losing connectivity along the way. That means it’s difficult to tell when the client should fail over to a different server.

Sivabalan Narayanan, Rajesh Mahindra, and Christopher Francis — Uber

  • They share the challenges they face when designing mobile failover handlers for Uber apps and how the design has evolved as the system operates among users around the world.

Incident review: Service outage on 25 October 2020

Here’s one I missed from last November. Some good stuff to learn from, especially if you run Vault on kubernetes.

This outage was caused by a cascading failure stemming from our secrets management engine, which is a dependency of almost all of the production GoCardless services.

Ben Wheatley — GoCardless

  • As mentioned above, GoCardless’s November 2020 disability retrospective article.


KubeWeekly # 257 April 2nd, 2021

The Headlines

Editor’s pick of the highlights from the past week.

etcd Project Journey Report: Individual project contributors increase by 67% after joining CNCF

Etcd has reached some impressive growth milestones since joining the community, and we can’t wait to see what the project continues to accomplish! Have a look at the full etcd Project Journey Report to learn more about these accomplishments in more detail.

  • As mentioned above, CNCF’s Project Journey Reports has “etcd”, so it introduces it. The numbers for contributors, PR, etc. are rising and diversity is increasing, so it looks like they are growing steadily.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Cloud lateral movement: Breaking in through a vulnerable container

Stefano Chierici, Sysdig

  • It introduces a staged, but real-world scenario to showcase how it would be possible for an attacker to get full access to a cloud account. It also describes how to use the Sysdig Cloud Connector to detect and mitigate this type of attack.

Kubernetes Lab on Baremetal

Marco Lancini

  • It details the steps taken to deploy your own Kubernetes lab on bare metal, especially Intel NUC.

The worst so-called “best practice” for Docker

Itamar Turner wedding ring

  • The following four, which are wrongly told on the Internet as Docker’s best practices, are explained along with the title, including how to deal with them.
    ○ Bad argument #1: You can’t upgrade inside an unprivileged container
    ○ Bad argument #2: The maintainers of the base image should install security updates
    ○ Bad argument #3: If you install security updates you will get the latest version of packages
    ○ Bad argument #4: Upgrades don’t work

Get Updated With Outdated

Treva Williams, Replicated

  • It introduces “Outdated”, a simple kubectl plugin tool that quickly and efficiently identifies old container images on Kubernetes Crowster .

Linkerd — Service Mesh for Kubernetes (quick intro)

Saiyam Pathak, Civo

  • As the title suggests, a YouTube video that introduces Linkerd in about 6 minutes.

Benchmarking and Evaluating Your Kubernetes Storage with Kubestr

Michael Cade, Kasten by Veeam

How to replace Docker with Podman on a Mac

Dave Meurer, Red Hat

  • At the beginning, it touched on how it replaced Docker with Podman on its macOS local PC, and documented the migration process so that readers can have a better migration experience.

A Kubernetes Service Mesh Comparison

Guillaume Dury. Cloud 8

  • I will skip it because it is taken up in the above DEVOPS WEEKLY ISSUE #535.

Awesome Kubernetes (K8s) Security

  • I will skip it because it is taken up in the above DEVOPS WEEKLY ISSUE #535.

Anthos on Bare Metal and Akri — Managing Leaf Devices on Edge Kubernetes Clusters from Cloud

Gokul Chandra

  • The construction example of the configuration in the title is explained carefully with CLI/screenshots/images.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Introducing Kubestr: a New Way to Explore your Kubernetes Storage Options

Michael Cade @Kasten by Veeam

  • A 40-minute session introducing “Kubestr”.
  • There is a brief explanation and demo around CSI and storage.

Application life-cycle orchestration with Keptn

Jürgen Etzlstorfer & Andi Grabner @Dynatrace

  • A one-hour session introducing “Keptn”.
  • It is a good arrangement to ask questions from the moderator while the speakers run the demo.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

What’s Next for Sigstore?
Dan Lorenc, Google

  • The future movement of “Sigstore” is explained with the following points.
    ○ Trust Roots
    ○ Tooling
    ○ Getting Involved

Weaveworks (part 1), with Alexis Richardson

Craig Box, Kubernetes Podcast from Google

How a memo from Jeff Bezos catalyzed Kong’s quest to connect APIs for the enterprise

Mark Albertson, SiliconAngle

  • It explains about API connectivity in the enterprise, with stories inspired by Kong by the famous “API Mandate” published by Jeff Bezos of Amazon Inc. in 2002, and stories that supported API efforts as an investor.

Flux April 2021 update

  • The following items summarize the update information of “Flux”, which became an Incubation project, in April 2021.
    ○ Before we get started, what is GitOps?
    ○ The Road to Flux v2
    ○ Flux is a CNCF Incubation project
    ○ Flagger v1.7.0 is out
    ○ Upcoming events
    ○ In other news

Kubernetes Community days Bangalore CFP extended to 15th April!

  • As mentioned above, the CFP period has been extended to 4/15. If you have any chance, you can submit it and contribute.

Kubernetes at Edge, FinOps and Kubernetes, and Diversity: Share your thoughts with CNCF microsurveys

Istio for Beginners: Day 33 of #100DaysOfKubernetes​

Anais Urlichs, CodeFresh

  • A YouTube video that introduces Istio to beginners in about 15 minutes.

Upcoming CNCF Online Programs

Cloud Native Live: A Deep Dive into Kubestr — A New Way to Explore your Kubernetes Options
Michael Cade @Kasten
April 7, 2021 at 12pm PT
Register Now

What Is Continuous Improvement?
Pini Reznik @Container Solutions
April 8, 2021
Register Now

CNCF Online Programs Playlist on YouTube

Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store