SRE / DevOps / Kubernetes Weekly Collection#26(Week 31)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #500 July 26th, 2020
SRE Weekly Issue #229 July 26th, 2020
KubeWeekly #227 August 1st, 2020

DEVOPS WEEKLY ISSUE #500 July 26th, 2020

500 issues

Here are 5 posts still worth reading from Devops Weekly issues through the years.

Issue #1

But what is Devops? I know a number of people have signed up to this newsletter having only recently come across the term. It’s safe to say Devops means different things to different people at this early stage, but I’m going to start out by pointing everyone to James Turnbull’s WHAT DEVOPS MEANS TO ME

  • The title is “What DevOps Means to Me.”

Issue #100

Interesting set of blog posts, describing protocols or patterns for Devops adoption. The first two talk about the advantages of starting small and fixing a real problem quickly and about configuration management and limiting manual changes.

  • The titles are “Devops Protocols: Start Small” (linked above) and “ Devops Protocol: No Manual Changes “.

Issue #200

If you’re just getting to grips with monitoring it can be difficult to know where to start. This presentation gives you a quick overview of the last 5 years. Lots of ideas for things to improve.

  • The title is “5 Years of Metrics & Monitoring”.

Issue #300

I attended the recent Operability conference in London and these two posts nicely summarise the various talks and provide lots of links to the slides.

Issue #400

Good tips for ensuring the software release process is robust, emphasising that this means clear ownership and treating deployment software like the critical production system it is.

  1. Get someone to own the deploy software


It’s that time of year again. The regular Puppet State of Devops survey is open. The focus this year is the relationship between change management, continuous delivery and self-service platforms.

  • “Puppet’s 2020 State of DevOps survey” web page by Puppet. It seems that it is being carried out at this time every year.

Documentation and design serve a critical role in building robust systems. This post looks at why design documents are useful and what sort of thing they should include.

  • The title is “Design Docs at Google”.

A new report on the state of public Terraform code security. Some useful data and some good tips for anyone using Terraform to configure services.

  • The title is “Introducing the State of Open Source Terraform Security Report”.

A look at using Azure Pipelines to validate a sysmon configuration automatically.

  • The title is “Using Azure Pipelines to validate my Sysmon configuration”.

A good story of migrating a low-level component at scale, in this case an application server. Canary rollouts, upstream contributions, performance and other interesting topics.

  • The title is “How we migrated application servers from Unicorn to Puma”.

Embracing cloud native technologies and ways of working comes with challenges, some of which this post documents, including security, lack of expertise, slow release cycles and more.

  • The title is “Top 7 Challenges to Becoming Cloud Native”.
  1. While great in theory, the problem with cloud native computing is that it isn’t always easy or straightforward to implement — especially if you’re an enterprise with long-standing, legacy applications.
  • The 7 most common issues as follows.
  1. Slow release cycles and accelerated pace of change


kube-iptables-tailer does just what you expect. It exposes the underlying iptables data to kubectl, Handy for spotting services trying and failing to communicate to one another in Kubernetes.

  • Kubernetes GitHub page for the OSS tool “kube-iptables-tailer” that gives you more visibility into network issues in your cluster.

Kconmon is a Kubernetes connectivity monitoring tool that runs frequent tests (tcp, udp and dns), and exposes Prometheus metrics that are enriched with the node name, and the locality information (such as zone), enabling you to correlate issues between availability zones or nodes.

  • Kubernetes GitHub page of the OSS tool “Kconmon” that monitors connections between nodes.

SRE Weekly Issue #229 July 26th, 2020


“How could they be so stupid?”

More details have emerged about the Twitter break-in last week, leading some to utter the quote above. Here’s a take on how to see it as not being about “stupidity”.

Lorin Hochstein

  • An article about a case where Twitter accounts were hijacked one after another on July 15th (US time) and abused for bitcoin remittance fraud.

Data Consistency Checks

The data in your database should be consistent… but then again, incidents shouldn’t happen, right? Slack accepts that things routinely go wrong with data at their scale, and they have framework and a set of tools to deal with it.

Paul Hammond and Samantha Stoller — Slack

  • Blog by Slack engineers. In the beginning, the author says that “An entire ecosystem of monitoring and administrative tools exist for operating our databases, making sure they replicate, scale and are generally performant. Similarly, a number of tools accompany the databases’ query language from linters and beautifiers to query builders and object mappers. But after our application has written data, there is very little tooling to verify that the data is as expected and remains as such.” and explains mainly “Consistency Check Pattern.”

Obstacles to Learning from Incidents

I learned a lot from this article. My favorite obstacle is “distancing through differencing”, e.g. “we would never have responded to an incident that way”.

Thai Wood — Learning from Incidents

  • The following points explain “Obstacles to Learning from incidents” in the title. As with the editor of SRE Weekly, “Distancing through differencing” remains to me.
    ○ Distancing through differencing
    ○ Overconfidence
    ○ Root cause
    ○ Only trying to learn from “bad” things
    ○ High pressure reporting requirements
    ○ Making sure this never happens again
    ○ Confusing writing, distribution, or meetings with learning

You don’t need SRE. What you need is SRE.

[…] SRE, that is SRE as defined by Google, is not applicable for most organizations.

Sanjeev Sharma

  • Here are the points the author wanted to convey in the title and the first section.
    ○ You do not need SRE. Don’t get me wrong, you need (service/system) Reliability Engineering.
    ○ You still need to automate repetitive, typical tasks in operations.
    ○ You just don’t need to, and really should not do it the Google way.
    ○ You are not Google. Very few organizations are.

Questionable Advice: “What’s the critical path?”

Expert advice on what questions to ask as you try to figure out what your critical path is (and why you would want to know what it is).

Charity Majors

  • The author was questioned “Any advice/reading on how to establish a team’s critical path?” and wrote down her thoughts.

Thinking About Your Humans With J. Paul Reed

This podcast episode was kind of like a preview of J. Paul Reed and Tim Heckman’s joint talk at I love how they refer to the pandemic as a months-long incident, and point out that if you’re always in an incident then you’re never in an incident.

Julie Gunderson and Mandi Walls — Page it to the Limit

Rebuilding messaging: How we bootstrapped our platform

I love a good dual-write story. Here’s how LinkedIn transitioned to a new messaging storage mechanism.

Pradhan Cadabam and Jingxuan (Rex) Zhang — LinkedIn

  • Part 2 of the “Rebuilding Messaging” series describes a major migration of existing data to a new database, or bootstrapping of data from a legacy system to a new system, as is commonly mentioned. ing.


KubeWeekly #227 August 1st

The Headlines

Editor’s pick of the highlights from the past week.

Scheduling, with David Oppenheimer

David’s work with Borg, Omega and now Kubernetes over the past 13 years, puts him, in the words of Tim Hockin, “among the world’s experts in scheduling systems”. On this week’s episode of the Kubernetes Podcast from Google, David talks about his experience with scheduling systems, how learnings from Omega became Kubernetes features, and what he thinks the biggest challenges facing the cluster management space are today.

Jaeger Project Journey Report: A 917% increase in companies contributing code

CNCF staff

This week, CNCF published a project journey report for Jaeger. This is the sixth such report compiled for CNCF graduated projects. The report assesses the state of the Jaeger project and how CNCF has impacted its progress and growth.

Jaeger is an open source, end-to-end distributed tracing platform built to help companies of all sizes monitor and troubleshoot their cloud native architectures. Contributors to Jaeger include many of the world’s largest tech companies, such as Uber, Red Hat, Ryanair, IBM, and Ticketmaster as well as fast-growing mid-size companies like Cloudbees. Read the full report.

  • CNCF released the Jaeger Project Journey Report, each report for graduated products in CNCF.

Register by August 3, 2020 at 23:59 PDT and you will be entered into a drawing* to win one of the below gift boxes.

Keep Cloud Native Delighted Swag Box (500 available) which includes: KubeCon + CloudNativeCon Europe t-shirt
Keep Cloud Native Connected patch
Project logo face mask
Diamond sponsor surprise
Kubernetes fidget spinner
CNCF socks

Grand Prize! Keep Cloud Native Delighted Deluxe Swag Box (10 available) which includes:

All the above items PLUS
$150 gift card to the CNCF online store

*The drawing is open to both pass types, Full Event and Keynote + Expo Hall only, whether already registered or registering between now and August 3. Winners must be registered by the August 3 deadline AND attend the conference. Drawing will be held and winners notified by email on August 24, 2020. Limit of (1) box per participant.

Not only do you have the opportunity to win swag but by registering now, time is blocked on your calendar so you won’t miss a thing. It’s a win-win!

Register now!

  • Information on KubeCon + CloudNativeCon Europe drawing. 2020/08/03 23:59 If you registered by PDT, you would get the lottery right. A lottery of 500 people will be held automatically, and the winners will be notified by email on 8/24. It seems that the target tickets are both $75 for full session participation and tickets for free keynote + sponsor session.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: One large cluster or lots of small ones? Pros, cons and when to apply each approach

Flavio Castelli, Distinguished Engineer @SUSE

  • It explains the pros and cons of both approaches, running Kubernetes clusters in “one large cluster” and “lots of small clusters”, and which solutions can be used to alleviate some of their drawbacks there.

CNCF Member Webinar: Kubernetes Policies 101

Eran Leib, Founder & VP Product Management @Apolicy and Spenser Paul, Director of Sales, North America @DoiT International

  • Kubernetes policy is explained focusing on the following topics.
    ○ What type of Policies exist?
    ○ How do we define and enforce Policies?
    ○ What best practices are available?

CNCF Member Webinar: GitOps Continuous Delivery with Argo and

Codefresh Brandon Phillips, Solutions Architect @Codefresh

  • Argo and Codefresh are examples of how to use GitOps to repeatedly achieve reliable and fast release.

CNCF Member Webinar: Event-Driven Cloud Native Workflows Use Cases and Patterns

Sebastien Goasguen, CTO @TriggerMesh

  • The explanation focuses on the following points. The commentary also mentions the products of TriggerMesh.
    ○ Discuss a set of serverless use-cases, from LEGO to HSBC, and highlight common patterns.
    ○ Show how these patterns can be reproduced with technologies like k-native and the cloud event specification.
    ○ Finish by waiting for the pros and cons of going serverless directly in the cloud vs. running some of the backing infrastructures yourself.

CNCF Member Webinar: Cluster API — Yesterday, Today, Tomorrow

Saad Malik CTO & Co-Founder @Spectro Cloud Jun Zhou Chief Architect@Spectro Cloud

  • It describes cluster APIs and common Kubernetes lifecycle management options.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Domain-Oriented Microservice Architecture

Adam Gluck, Uber

  • It introduces a general approach to microservice architecture called “DOMA (Domain-Oriented Microservice Architecture)” which Uber is working on.

12 Container image scanning best practices to adopt in production

Pawan Shankar, Sysdig

  • “In this blog, we will cover many image scanning best practices and tips that will help you adopt an effective container image scanning strategy.” he explains 12 best practices below.
  1. Bake image scanning into your CI/CD pipelines

Certificate management on Istio

Szabolcs Berecz, Banzai Cloud

  • Following on from a recent post on the blog titled “Certificate management on Kubernetes”, it focuses on the differences that the Istio service mesh makes.

Kubernetes Secrets: A Secure Credential Store for Jenkins

Vasumathy Seenuvasan and Ravi Bukka, eBay

  • A story that utilizes the function of Kubernetes secret to manage eBay’s Jenkins credentials.

Conftest joins the Open Policy Agent project

Gareth Rushgrove, Snyk

  • I’ll skip it because it’s an article I covered last week.

GitOps Continued: Using Tekton for CI and Argo for CD

Mario Vazquez, Ryan Cook, Chris Short, Red Hat

  • A nearly 100-minute Twitch video featuring Tekton CI and Argo CDs by Red Hat members.

The Seccomp Notifier — New Frontiers in Unprivileged Container Development

Christian Brauner

  • It details the new seccomp notification feature they have developed in both the kernel and user space. It’s been explained in great detail, and I haven’t read it completely yet.

Introduction to instrumenting applications with Prometheus

Brian Brazil, Sysdig

  • The following two points are mainly looked at.
  1. Analyzing your service to find the most useful places to add metrics, how to add that instrumentation, getting it exposed and scraped.

Deny Rules! Fine-Grained Kubernetes Access Controls with Kyverno

Shuting Zhao

  • It introduces how to easily manage fine-grained access control as a custom policy using Kyverno.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

The Kubernetes CPU Manager: a Ghost Story

Michael Vigilante

  • The CPU was influenced by the “ghost” that lives in the Kubernetes cluster, so the story was investigated and the cause was clarified.
  1. Set -reserved-cpus option to kubelet

An Introduction to the Cloud Native Landscape

Catherine Paganini, Kublr

  • It decomposes CNCF cloud Native Landscape Map and provides an overview of the entire landscape, layers, columns, and categories.

A Kubernetes Ghost Story

Guinevere Saenger, GitHub

  • The speaker was speaking on a dark background, so I felt “it’s creepy” along with the title in this YouTube video.

An Interview with CNCF GM & Heavybit Advisor, Priyanka Sharma

Mina Benothman, Heavybit Industries

  • An interview article with Priyanka Sharma who became a new GM of CNCF.

KUDO or how to simply create your Kubernetes operator with Denis Jannot (in French)

Electro Monkeys podcast

  • A podcast delivered in French, it is covered by this blog sometimes. This time, the guest is Denis Jannot ( Sales Engineer at D2iQ .

How Policy Engines Make Day 2 Easier

Emily Omier, Nirmata

  • It demonstrates some specific ways to streamline Day 2 operations by automating configuration using an intelligent policy engine.

Announcing Vitess 7

Deepthi Sigireddi, Vitess maintainer

  • Vitess 7 release article. The original article was by Deepthi Sigireddi, maintainer at
  1. Improved SQL Support

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code
TiKV team
July 31, 2020 10:00 AM Pacific Time

Member Webinar: Comparing eBPF and Istio/Envoy for Monitoring Microservice Interactions
Roko Kruze, Solutions Engineer @Flowmill
Mike Cohen, Co-Founder and COO @Flowmill
Aug 4, 2020 10:00 AM Pacific Time

Member Webinar: Debugging your debugging tools; What to do when your service mesh goes down in production?
Neeraj Poddar, Co-founder and Chief Architect @Aspen Mesh
Aug 5, 2020 7:00 AM Pacific Time

Member Webinar: Making Data Work for Developers with Kubernetes & Cassandra
Chris Splinter, Sr. Product Manager — Developer Solutions @DataStax
Patrick McFadin, VP of Developer Relations @DataStax
Aug 5, 2020 1:00 PM Pacific Time

Member Webinar: Maximizing M3 — Pushing performance boundaries at scale in a cloud-native distributed metrics engine
Ryan Allen, Senior Software Engineer @Chronosphere
Aug 6, 2020 10:00 AM Pacific Time

Ambassador Webinar: GitOps, DSL and App Model — Getting Started Building Developer Centric Kubernetes
Lei Zhang, Staff Engineer @Alibaba
Aug 7, 2020 10:00 AM Pacific Time

Member Webinar: Hardware for Kubernetes, Peeling Back the Layers
Erik Reidel, SVP Compute & Storage Solutions @ITRenew
Aug 11, 2020 10:00 AM Pacific Time

Member Webinar: The Open-Source Observability Playbook
Hen Peretz, Head of Solutions Engineering @Epsagon
Aug 12, 2020 7:00 AM Pacific Time

Member Webinar: Migrating Real-Time Communication Applications to Kubernetes at Scale: Learnings from 8×8’s Experience
Michael Laws, Sr. Site Reliability Engineer/DevOps at 8×8
Pankaj Gupta, Sr. Director at Citrix
Aug 12, 2020 1:00 PM Pacific Time

Member Webinar: MLOps automation with Git Based CI/CD for ML
Yaron Haviv, Co-Founder and CTO, Iguazio
Aug 26, 2020 1:00 PM Pacific Time

Project Webinar: Kubernetes 1.19
Kubernetes release team
Aug 28, 2020 10:00 AM Pacific Time

Member Webinar: Getting started with container runtime security using Falco
Loris Degioanni, CTO and Founder @Sysdig
Sept 2, 2020 1:00 PM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store