SRE / DevOps / Kubernetes Weekly Collection#68(Week 20, 2021)

13 min readMay 24, 2021

In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #542 May 16th, 2021
SRE Weekly Issue #270 May 16th, 2021
KubeWeekly #262 May 21st, 2021

DEVOPS WEEKLY ISSUE #542 May 16th, 2021

News

A post on the best way to present metrics on dashboards. Definitely relevant to anyone presenting information from monitoring or software development metrics.

The title is “Metric Display Standards”.
It describes standard object models and data presentations that provide a reliable level of utility, ease of use, and accessibility for all metrics in every use case.
The image of the content / the diagram that promotes understanding is effectively used and I kept it as a good teaching material.

A rebuttal to a post from last week, on how Backstage hopes to address some of the traditional issues described with developer portals.

The title is “Developer portals are a super power”.
A counter-article to the article “developer portals are an anti-pattern” featured last week.
In response to “As evidence of the apparent ills of developer portals, Corey offers up the fact that he hasn’t seen Backstage deployed in any company other than Spotify.”, it describes that Expedia Group, Zalando, and American Airlines have specifically chosen Backstage for their internal developer portal. It introduces that the information of the organization that is hiring is published in “ADOPTERS.md” on GitHub as “many more participants listed”.

A couple of write ups from the recent KubeCon EU event. High level themes and details of various talks and tracks.

It covers two articles together. The title of the first article is “KubeCon Europe 2021 Wrapup”.
○ It looks back on his third serial virtual participation in KubeCon.
○ Last year it was able to watch most of the programs live, including the keynote, but this year the event start schedule was ahead of schedule. He found it inconvenient because the keynote speech started around 1 am in his local time, but he thought it was correct because it was adjusted according to the venue Europe.
The title of the second article is “ KubeCon EU 2021: Developers, Developers, Developers (and Control Planes) “.
○ It focuses on themes like “developer and developer experience within cloud” as following its key takeaways.
● Developers, and developer experience, within cloud is a big deal
● End users are making a big impact in the cloud native world right now
● Networking in the cloud (and K8s) is still evolving
● Open standards are providing key abstractions, extensibility, and innovation
● Control planes are where the most end user value is being created
● Anyone can (and should) contribute to the community: Docs are a great place to start

A post on avoiding vendor lockin by owning you data, your front-end interfaces, your source code and avoiding long term contracts.

The title is “Rule number one: Avoid vendor lock-in”.
For the author’s position, the following Disclaimer is listed at the beginning.
○ Extra-prominent disclaimer: Views expressed here are my own, and don’t necessarily reflect the views of the Government of Canada. Products mentioned below are examples and not endorsements.
It is explained carefully according to the following items. It is good to be able to imagine what to do concretely and the scene. Especially, “public money? Public code.” is a simple and convincing message.
○ What is vendor lock-in?
○ How do you avoid vendor lock-in?
○ Own your data (and make sure you can move it somewhere new)
○ Own your front-end interfaces
○ Own your software source code
○ Avoid long-term contracts
○ What should vendors do, then?
In the item “What should vendors do, then?”, it proposes to provide high-quality services to vendors in order to meet such demands, and to provide specific examples of using external vendors.

A good post on the practices required to adopt continuous delivery, with examples using the Batect build system.

The title is “DevOps Practices for Continuous Deployment”.
It describes three DevOps practices and how they were applied using the OSS tool “Batect’’, which performs tasks in Docker containers.

A quick overview of the metrics you should be measuring if you’re running Kubernetes clusters.

The title is “Key Kubernetes Metrics and Resources to Monitor for Peak Cluster Performance”.
The following configuration details Kubernetes’ key accessible metrics and how to understand them.
○ Kubernetes Objects
○ Kubernetes Cluster & Node Metrics
○ Kubernetes Deployments & Pod Metrics
● Kubernetes Metrics
● Container Metrics
● Application Metrics
○ Monitoring Kubernetes with Sematext
○ Conclusion

Tools

vcluster creates fully functional virtual Kubernetes clusters running inside a namespace of an underlying Kubenretes cluster. It’s cheaper than creating separate clusters and it offers better multi-tenancy and isolation than regular namespaces.

As mentioned above, the web page of the tool “vcluster” provides a virtual Kubernetes cluster in a namespace within a Kubernetes cluster.
Click here for the GitHub page.

Lima is a brand new tool for anyone on a Mac wanting to run Linux instances that come pre-configured with containerd for running containers.

A GitHub page of “Lima (Linux-on-Mac)”.

eBPF on Windows. Lots of interesting things are being built on top of eBPF in the Linux kernel. The idea of eBPF implementations in other operating systems and even user space programmes is certainly interesting.

An article from the Microsoft Open Source Blog titled “Making eBPF work on Windows”. As mentioned above, eBPF Windows.
Click here for the GitHub page.

Do you find yourself writing bash scripts but wish you could write JavaScript instead? Zx provides some nice utilities and removes the need for users to be node or npm experts.

A GitHub page of “ZX”. It provides a convenient wrapper for child_process, escaping arguments and providing appropriate defaults.

Ahoy is a user-friendly dashboard for managing Helm-deployed applications on Kubernetes.

As mentioned above, an introductory article on the user-friendly Helm dashboard “Ahoy!” On Kubernetes.
Click here for the GitHub page.

SRE Weekly Issue #270 May 16th, 2021

Articles

Thundering herds, noisy neighbours, and retry storms

This is an in-progress document about the kinds of patterns we see or use when designing systems. The author warned me that it’s a work in progress and maybe not ready for prime-time, but I think this is exactly the time when I should get it in front of your eyes.

I’d love your help growing this list. If you know of a name that is missing from the list please send me a tweet with the name and a short description of it and I’ll include it in the list with a link to your tweet

Mads Hartmann

As mentioned above, here is a list of patterns you will see when designing the system. It was created by the author of this article. The term “operational patterns” proposed by Lex Neva, the editor of SRE Weekly in the article, seems to be good.

The Downtime Project

Whoa, a podcast dedicated to picking apart public incident postings! I love this, because there’s a lot that’s left to shorthand, and a live conversation is a great way to flesh it out.

Tom Kleinpeter and Jamie Turner

As mentioned above, the introduction of podcasts. From the March 20, 2021 introduction, weekly engineering teams share lessons learned from Outages and postmortems.

Health boss unsure how many hospital patients were overdosed due to Windows upgrade

There’s a really interesting undercurrent in this story about resilience. Nurses can catch these kinds of errors, but this just one layered protection among many. If the system is reduced to relying on that second-layer defense, the overall resilience is diminished.

Daniel Keane — ABC News

As the title suggests, an error due to the upgrade to Windows 10 occurred and the patient may have received more than 10 times the required amount of medication.

Have you ever seen a car crash test? That’s Chaos Engineering

Of course, before reaching this stage, all of the pieces are tested in isolation. But until they’re all put together, it’s almost impossible to predict the behavior of the finished product during an accident.

Mikolaj Pawlikowski

As the title suggests, the necessity of chaos engineering is explained by referring to the crash test of cars.

4 attributes of a great site reliability engineer

The attributes discussed are:

* Problem solving
* Awareness building
* Collaboration
* Empathy

Jayne Groll

It interviewed DevOps Institute ambassadors and experts in the target fields of SRE about “What makes for a great SRE?” and explained the above four main attributes they proposed.

How to hire Site Reliability Engineers (SREs): 5 top qualities

Wait, more attributes? Oh, and by the same author, too:

* “Great SREs have a passion for high-quality automation.”
* “A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; […]”
* Prize Communication.
* Look for longer-term support experience.
* Look for a person that demonstrates empathy.

Jayne Groll

I have Exactly the same impression as the above the Editor’s one. The same author as the article above interviewed “What makes for a great SRE?” and explained the five main attributes they proposed.
Personally, the question arises: “Why / what are the criteria for writing these two articles?” and “Why aren’t they written so that they are connected?”

Site Reliability Engineering for Native Mobile Apps

This one explore the application of SRE principles to mobile app design.

Abhijith Krishnappa

It describes how to apply the SRE principles to Mobile Apps reliability.

Choosing SLOs that users need, not the ones you want to provide

This two-part series uses a narrative case study format to show how SLOs can be misleading. You might have great numbers, but what are the numbers actually measuring?

Adam Hammond — Squadcast

Regarding SLO, it approaches the following problems.
A lot of IT professionals tend to think that they know the best metrics, and they do; the only problem is that they are the best metrics for monitoring systems, not for improving customer satisfaction.

Outages

A major US oil pipeline
The pipeline was targeted by a ransomware attack.
GasBuddy
This app for finding gasoline prices seems to have been impacted by a flood of user traffic driven by the US oil pipeline outage. In fact, their front page seems to be very slow for me as I write this.
Salesforce
The outage was widespread and even affected their status page.
eBay
Microsoft Outlook

KubeWeekly #262 May 21st, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Deadline approaching: KubeCon + CloudNativeCon North America 2021 CFP closes on May 23!

KubeCon + CloudNativeCon North America 2021 is happening October 12–15 in Los Angeles, CA along with a virtual experience for those who can’t travel!

Are you ready to see your name in lights and potentially have the opportunity to speak on a real stage again? Apply to speak now — the Call for Proposals (CFP) is open until Sunday, May 23, 11:59 PM Pacific Daylight Time. Since the event will be a hybrid experience, you can submit to speak either in person or virtually.

Is this your first time submitting? Check out our submission guidelines to make your proposal shine.

KubeCon + CloudNativeCon North America 2021 CDF deadline reminder and information on how to submit CFP.

Take the CNCF Cloud Native Survey — Part 1

We want to hear from you! Be sure to take the Cloud Native Survey — Part 1 to share your thoughts on cloud, containers, and Kubernetes. We have a full conference pass for KubeCon + CloudNativeCon Europe 2022 to give away — complete the survey by June 15 for a chance to win!

It is conducting a survey as described above, and it seems that if you answer, you will get a pass of “KubeCon + CloudNativeCon Europe 2022”.
After answering, you will be asked to enter your name / company name / email address. A pass should be presented at a later date.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Where to begin your dev-centric cloud infosec journey

Guy Eisenkot & Ashley Ward, Palo Alto Networks

It explains infosec journey = where to start on the cloud and how to incorporate security into existing processes to minimize interruptions and maximize productivity.

Kubernetes 1.21 Release

Nabarun Pal & Anna Jung, Kubernetes 1.21 Release Team

A release webinar with release team leads and enhancement leads to coincide with the release of Kubernetes 1.21. Confirmed at the beginning that the release cycle will be changed from 4 to 3 times a year from 1.22.

Leveling Up Kubernetes with kube-vip

Dan Finneran, Equinix Metal

It shows you why they developed “kube-vip” for HA and load balancing inside and outside the Kubernetes cluster, and explains how it works with a demo.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Journey of a Cloud SQL Packet

Damian Peckett, kloeckner.i

It briefly describes Cloud SQL, then introduces how to use it at kloeckner.i and the tools built for workflows with low administrative overhead.

How to create a Ubuntu Packer Image and deploy on a Bare Metal Server

Chitrabasu Khare, InfraCloud

It describes how to use Packer to create a minimal raw Ubuntu image and deploy it to a bare metal server using the provisioning engine “Tinkerbell”.

Workload mobility in a service mesh world

Cody De Arkland, Kong

It mentions that the aspect of “Zero Trust Security” has been greatly focused on, and explains with the following structure under the theme of “What is the next chapter of Zero Trust for Service Mesh?”.
○ Same Problem, Different Data Center
○ Creating Environment Consistency
○ Progressive Delivery, Migration and Controlling Flow
○ The Next Generation of Service Mesh

Key Kubernetes metrics to monitor for peak cluster performance

Adnan Rahic, Sematext

Since it is covered in DEVOPS WEEKLY ISSUE #542 above, I will skip it.

Learn how to build functions faster using Rancher’s kim and K3s

OpenFaaS Ltd.

It focuses on how to test “ kim (The Kubernetes Image Manager) “, a new project by Rancher Labs that allows you to build container images directly into the node’s image library.

Autoscaling Kubernetes clusters

Puja Abbassi, Giant Swarm

It focuses on “Cluster Autoscaler”, the explanation is based on the following configuration along the title.
○ Cluster Autoscaler
○ Cloud Providers
○ Scaling Up
○ Scaling Down
○ Scaling Latency
○ All the Autoscalers
○ Spot Instances
○ Conclusion

AWS Secrets Manager on Kubernetes using AWS Secrets CSI driver Provider

Theo “Bob” Massard, Particule

It follows-up with a test of the AWS Secret Store Provider and explains how to use it as a bridge between the AWS Secrets Manager and your app’s environment.

Kube-Prometheus — A complete monitoring stack using Jsonnet

Lennart Jern, Elastisys

It explains how to use “kube-prometheus” to achieve accurate monitoring of apps.

The easiest way to debug Kubernetes workloads

Martin Heinz

The content of the title is examined and explained with the following structure.
○ There Might Just Be a Better Way…
○ Configuring Feature Gates
○ Process Namespace Sharing
○Putting It To Good Use
○ Bonus: Debugging Cluster Nodes
○ Conclusion

Kubernetes capacity planning: How to rightsize your cluster

Jesus Ángel Samitier, Sysdig

It explains how to identify unused resources and how to properly set the capacity of the cluster.

Learn how Istio can provide a service mesh for your functions

Alex Ellis, OpenFaaS Blog

It starts with a brief introduction to get you started with Istio and OpenFaaS integration, and explains how to measure your cluster’s resource consumption and how to create an Istio Gateway TLS certificate.

GitOps Guide to the Galaxy (E15): Introducing the App of Apps and ApplicationSets

Christian Hernandez, Chris Short, Red Hat

It explains the concept of “App of Apps” and how to mitigate the issues that exist in GitOps.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Pixie, with Zain Asgar and Ishan Mukherjee

Craig Box, Kubernetes Podcast from Google

A new episode of the Kubernetes Podcast by Google employees. This time the Host is Craig Box and Guest Host Alex Ellis. The previous episode of him is as follows.
○ Independent Open Source, with Alex Ellis
The Guests are Zain Asgar and Ishan Mukherjee, co-founders of Pixie software, which was recently acquired by New Relic.
The topics I was interested in in the News of the week are as follows.
○ eBPF for Windows
○ GKE Dataplane V2 is GA
○ VMware Tanzu SQL, with MySQL, for Kubernetes, 1.0

It’s Been a Full Year Since we Launched OpenShift TV

Chris Morgan, Red Hat

The title and article explain that one year has passed since “OpenShift TV” was started. The embedded video is a session of over 2 hours with the theme of “Playing with Prometheus”.

The need for new DevSecOps tool arises as infrastructure drift proves to be a multidimensional problem

CloudSkiff

The OSS CLI tool “driftctl” is introduced from the background of necessity.

Ahoy! (new project post)

Yoko Kawamoto

Another introductory article on “Ahoy!” Also featured in DEVOPS WEEKLY ISSUE #542 above. This has more images than textual information, which is good for visually viewing the contents.

Kubernetes, (Almost) Love at First Sight with Chris Ferreira

Committing to Cloud Native Podcast

At the time of 2021/05/23 23:21 (JST) and 2021/05/24 23:31 (JST), the URL cannot be accessed with a “DNS_PROBE_FINISHED_NXDOMAIN” error. I might be able to access it last Saturday? I will try it later.
○ At the time of 2021/05/29 10:48(JST), it worked.

ICYMI: CNCF YouTube Channel featuring all talks from KubeCon EU 2021are now available!

A playlist of published session videos for KubeCon EU 2021. I haven’t seen it at all, so I want to catch up.

Upcoming CNCF Online Programs

Live webinar

May 25 @10am PT: Service mesh configuration using xDS protocol on gRPC, and using Envoy presented by Megan Yahya, gRPC & Yan Avlaslov, Envoy (member submission by Google) — RSVP

Cloud Native Live

May 26 @8am PT: Universal Crossplane presented by Dan Mangum, Upbound — RSV

On-demand webinars

May 27: Containing your microservice sprawl presented by Tracy Ragan, DeployHub — RSVP
May 27: Defense strategy against Kubernetes attack TTPs presented by Manoj Ahuje, Tigera — RSVP
Looking for more great curated content? Visit our Online Programs playlist on YouTube.

Learn more about CNCF Online Programs

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara