SRE / DevOps / Kubernetes Weekly Collection#60(Week 12, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #534 March 21st, 2021
SRE Weekly Issue #262 March 21st, 2021
KubeWeekly #256 March 26th, 2021

DEVOPS WEEKLY ISSUE #534 March 21st, 2021


The State of Devops report is in its 10th year. This year’s survey is now open, focusing this year on how teams and work are organized, interaction between teams, feedback loops, self-service and more.

A post arguing for a software bill of materials standards and implementations. Lots of context to the problem and to how similar risks are mitigated in other areas.

  • The title is “Why the World Needs a Software Bill Of Materials Now”.
  • Through the case of hacking “Sunburst” as a starting point, it explains software supply chain attacks, bill of materials (BOM), and so on.

gRPC is a general-purpose RPC layer. Addressing a range of different types of services means it’s configurable. And configuration is often a source of errors. This post explains why, along with some examples to learn from.

  • The title is “gRPC is easy to misconfigure”.
  • It describes the following two annoying edge cases that the author recently encountered.
    ○ Client keepalive is dangerous: do not use it
    ○ Servers cannot return errors larger than 7 kiB

A 3 part series covering a wide range of production troubleshooting stories; performance problems, database migration, proxies, caching and more.

Are you tired of bumping the image tag manually every time you make a change in Kubernetes? This post covers how to automate deployments and updates using Argo CD.

  • The title is “Closing CI/CD loop using Argoproj”.
  • It describes the “ArgoCD Image Updater”, a tool that continuously monitors when new container images for apps are available and automatically updates image tags in Git repositories.

An interesting interview on the importance and challenges of application security.

  • The title is “The biggest challenges — and important role — of application security”.
  • They share insights into application security (AppSec), its role in security organizations, and the challenges of AppSec professionals.

Standards benefit from multiple implementations. This post on runj describes a new OCI runtime implementation targeting FreeBSD and it’s Jails capability.

  • The title is “runj: a new OCI Runtime for FreeBSD Jails”.
  • It introduces a new experimental POC(proof-of-concept) OCI compatible runtime “runj” for the open-sourced FreeBSD jail.

SRE Weekly Issue #262 March 21st, 2021


The Prerequisites for Chaos Engineering

Chaos Engineering isn’t adding chaos to your systems — it’s seeing the chaos that already exists in your systems.

Along with four prerequisites, this article also includes 3 myths about chaos engineering that might be making you feel hesitant about starting.

Courtney Nash — Verica

  • It explains the basics needed to prepare for the team’s chaos engineering.
  • At the beginning, it describes that the article is based on the assumption that you are familiar with chaos engineering to some extent, and recommends an article shown below.
    ○ Ed note: This post presumes you have some familiarity with Chaos Engineering, and are considering whether you can start experimenting with it at your organization. If you’re not familiar with Chaos Engineering, here’s a great post to get you up to speed.

Managing On-Call in a Pandemic

This one’s from May of last year. Almost a year on, it’s interesting to see which of these we’ve already implemented.

Ashley Roof — Transposit

  • Eric Mayers , a veteran on-call veteran who has managed the on-call engineering team for 20 years on Google, YouTube, YikYak, etc., offered his practical advice from the early days of building a successful remote on-call engineering organization.

Being Just Reliable Enough

An amusing parable illustrating why not to try to be too reliable.

Andrew Ford — Indeed

  • From the experience over the weekend, the author explains some of the good things that Indeed applies to how it manages system reliability and the speed of new features.

Google debunks Russian claims that fire was connected to service outage

In the Outages section of last week’s issue, you’ll find two unrelated events referenced in this article: one about Russian internet censorship gone awry and another about a major datacenter fire.

Eric Johansson — Verdict

  • It describes the announcement by Google and the Russian authorities related to the fire at the data center of cloud service provider OVHcloud in France.

How to Analyze Contributing Factors Blamelessly

Along with what’s in the title, this article also covers the difference between an RCA and a contributing factors analysis.

Emily Arnott — Blameless

  • The following points are explained along with the title.
    ○ The feature launch schedule doesn’t account for server update timings
    ○ No policy to scale up server availability for feature launches
    ○ Server architecture could be updated to support more traffic
    ○ Incident response team could be overworked with new feature launch, delaying backup server availability

Rethinking site capacity projections with Capacity Analyzer

Lots of detail on how LinkedIn is improving their traffic forecasts. Warning/enticement: math contained within.

Deepanshu Mehndiratta — LinkedIn

  • A few years ago, an unprecedented increase in traffic broke the load test model, detailing what we did in response to the struggle to pass load tests across production data centers.

Testing in Production for Safety and Sanity

Everyone is testing in production, some organizations admit and plan for it.

How to do it right, what can happen if it goes wrong, and how to limit the blast radius.

Heidi Waterhouse — LaunchDarkly

  • As the title suggests, it explains testing in a production environment while answering questions such as “Does production testing replace other testing?” Two helpful YouTube videos are embedded at the end of the web page.

How we found and fixed a rare race condition in our session handling

Remember when GitHub logged you out? Ah, I remember it like it was last week. I mean, the week before. Here’s GitHub’s troubleshooting story about what went wrong.

Dirkjan Bussink — GitHub

  • It shares what GitHub did on March 8 as a security vulnerability response.


KubeWeekly #256 March 26th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

KubeCon + CloudNativeCon Europe 2021 — Virtual: Co-located event schedules now available!

KubeCon + CloudNativeCon Europe 2021 — Virtual is right around the corner (May 4–7) and what better way to extend your experience than adding on registration for a co-located event? These additional educational opportunities (additional registration and fee required) will take place on May 3 or 4, and we’re excited to share the recently published schedules for CNCF-hosted events. Find the details below:

Cloud Native Rust Day (May 3)
Cloud Native Security Day Europe (May 4)
Cloud Native Wasm Day (May 4)
Crossplane Community Day Europe (May 4)
FluentCon: Cloud Native Logging day with Fluent Bit & Fluentd (May 4)
Kubernetes AI Day (May 4)
Kubernetes on Edge Day (May 4)
Magma Day (May 3)
PromCon Online 2021 (May 3)
ServiceMeshCon Europe (May 4)

  • It seems that KubeCon + CloudNativeCon Europe 2021 Co-located events have been completed. I’m looking forward to more choices.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Modern continuous delivery on Kubernetes for developers

Gabriel Tanner

  • An article that aims to help readers understand the most important concepts of modern continuous delivery and maintain a Kubernetes deployment with a complete continuous delivery workflow without having to write a single line of pipeline code. Introducing a demo using Keptn.

Kubernetes Ingress Tutorial: Day 32 of # 100DaysOfKubernetes

Anais Urlichs

  • A YouTube video explaining the process of setting up an Ingress Controller on a Kubernetes cluster on Docker-Desktop.

Scaling microservices in Kubernetes

Ashley Davis

  • The following points are explained along with the title.
    ○ Vertically Scaling the Cluster
    ○ Horizontally Scaling the Cluster
    ○ Horizontally Scaling an Individual Microservice
    ○ Elastic Scaling for the Cluster
    ○ Elastic Scaling for an Individual Microservice
    ○ About the Book: Bootstrapping Microservices
    ○ Other Kubernetes Resources

Prometheus monitoring for Kubernetes

Saiyam Pathak, Civo

  • As the title suggests, a YouTube video that introduces Prometheus in about 5 minutes. Before I knew it, it became a series called “CNCF Minutes”.

Announcing Alpha OpenTelemetry access logging support in Envoy

Itamar Kaminski


A secure container runtime with OCI interface

  • Quark Container’s GitHub page. The features are the following three points.
  1. OCI compatible: Quark Container includes an Open Container Initiative (OCI) interface. Common Docker container images can r un in Quark Container.
  2. Secure: It provides Virtual Machine level workload isolation and security.
  3. High Performance: Quark Container is born for container workload execution with high performance. It developed with Rust language.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Automating SRE from “Hello World” to Enterprise Scale with Keptn

Jürgen Etzlstorfer & Andi Grabner @Dynatrace

  • Keptn’s Maintainers showcase the most common use cases, how to get started with your first project, and how to use Keptn to extend these practices to all your projects in your enterprise.

Flux is Incubating + the road ahead

Stefan Prodan @Weaveworks

  • It provides an overview of the “Flux” project, its evolution, the path to Flux 2, the meaning of Flux v1 maintenance mode, the best entry point to get started, and how to migrate.

Securing access to your Kubernetes applications — Using Dex for authentication and role based access control (RBAC) for authorization

Deepika Dixit & Onkar Bhat astKasten by Veeam

  • An overview of “Dex”, an open source identity service that uses OpenID connect to facilitate authentication of other apps, and an effective way to adopt RBAC while covering most of the use cases. ing.

Scaling monitoring at Databricks from Prometheus to M3

YY Wan & Nick Lanham @Databricks

  • It discusses why M3 was decided, how it was deployed, and shares lessons learned in the process.

Why your APIs should fly first class

Robert Ross @FireHydrant

  • The following points explain why giving top priority to APIs becomes a business game changer.
    ○ The benefits of building your API first and how it can pay dividends in the long haul
    ○ The different types of APIs and which choice is the right choice
    ○ The importance of hosting API documentation

Cloud Native Live: Crossplane — GitOps-based Infrastructure as Code through Kubernetes API

Viktor Farcic @CodeFresh

  • A one-hour session introducing “Crossplane”.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Replicated, with Grant Miller

Craig Box, Kubernetes Podcast from Google

10 predictions for cloud native in 2021 — Keynote, The DevOps Conference

Cheryl Hung, CNCF

  • As the title suggests, a Keynote slide from The DevOps Conference that introduces the following content and calls for participation in KubeCon + CloudNativeCon Virtual EU, May 4–7.
    🛠 Tech
    1. More Rust in Cloud Native
    2. Cross Cloud becomes (more) real
    3. Web Assembly and eBPF
    4. Kubernetes on the Edge
    👩🏻‍💻 DevOps
    5. GitOps grows significantly
    6. Chaos Engineering practices
    7. Rise of FinOps
    🌐 Ecosystem
    8. Pluggable developer and operator experience
    9. Service mesh consolidation
    10. End user driven open source

Take the CNCF Kubernetes at the Edge microsurvey

  • The first of three survey guides from this week’s CNCF.

Take the FinOps (CFM) for Kubernetes CNCF microsurvey

  • The second of three survey guides from this week’s CNCF.
  • Since the original link was broken, I deleted unnecessary parts and described it.

Take the CNCF Diversity microsurvey

  • The third of three survey guides from this week’s CNCF.

Cloud Native Security Day: Protecting our cloud native world, one container at a time

great Logan

  • It introduces Cloud Native Security Day (CNSD), which will be held as a Co-located Event during KubeCon EU2021

Upcoming CNCF Online Programs

Cloud Native Live: Application life-cycle orchestration with Keptn
Jürgen Etzlstorfer & Andi Grabner @Dynatrace
March 31, 2021 at 12pm PT
Register Now

Introducing Kubestr: a new way to explore your Kubernetes Storage Options
Michael Cade @Kasten by Veeam
April 1, 2021
Register Now

CNCF Online Programs Playlist on YouTube

Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store