SRE / DevOps / Kubernetes Weekly Collection#49(Week 1, 2021)

11 min readJan 13, 2021

In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #523 January 3rd, 2021
SRE Weekly Issue #251 January 3rd, 2021
KubeWeekly #245 January 8th, 2021

DEVOPS WEEKLY ISSUE #523 January 3rd, 2021

News

A good discussion of the benefits of compression. Starting out with a nice case study of migrating log data, bit expanding to other data transfer use cases. Particularly interesting because of the numbers and showing the financial savings on cloud services.

The title is “How we compress Pub/Sub messages and more, saving a load of money”.
An article aimed at covering scenarios where the cost of compression significantly outweighed by the savings, both in terms of computing resources and build time. It aims to encourage readers to consider compression outside of the standard boring use cases and to discover opportunities to apply it to their own systems.

Working to reduce the time taken for CI/CD pipelines is nearly always a good investment of time. This post covers a few areas you can likely optimise.

The title is ““WHY ARE MY TESTS SO SLOW?” A LIST OF LIKELY SUSPECTS, ANTI-PATTERNS, AND UNRESOLVED PERSONAL TRAUMA.”.
Over the past two weeks, the author has commented on “lead time to deploy”, citing her own and other tweets. This refers to the time it takes for the code to be written and deployed to production, also known as “how long it takes you to run CI/CD”

A great introduction to message queues, with a detailed look at various software options including AWS and GCP services, Kafka, NSQ and NATs.

The title is “The Big Little Guide to Message Queues”.
The following points describe the basic concepts underlying message queuing and how they apply to the general queuing systems currently available.
○ What message queues are and their history.
○ Why they’re useful and what mental models to use when reasoning about them.
○ Delivery guarantees that the queuing systems make (at-least-once, at-most-once, and exactly-once semantics).
○ Ordering and FIFO guarantees and how they affect sequencing, parallelism and performance.
○ Patterns for fan-out and fan-in: delivering one message to many systems or messages from many systems into one.
○ Notes on the pros and cons of many popular systems available today.

A quick introduction to Kyverno, an open source policy based tool for Kubernetes. In this example we’re shown how to easily automate adding labels to namespaces.

The title is “Auto-labeling Kubernetes resources with Kyverno”.
It describes “Kyverno” , an open source policy engine designed for Kubernetes , along with the title.
It recommends schema-driven development, schema-driven API protocol, and Protobuf, and the following articles were mentioned as topics to be considered in future articles, so look forward to the next article. This article was dated November 12, 2020, and as of January 11, 2021, it seems that no further articles have been published yet.
○ “We at Buf feel that the best option available today for schema driven development is Protocol Buffers (a topic we’ll explore in a future article), and we’re hard at work building such tooling to support organizations using Protobuf to define their services.”

A strong argument for strong schemas when defining APIs.

The title is “API design is stuck in the past”.
As the title says, it throws questions to API design.
○ The industry has embraced statically typed languages, but API design remains twenty years in the past. Schema driven development presents an opportunity to pull API design into the present.

Tools

Buf is providing a toolkit for making working with Protobuf APIs much easier, both for producers and consumers. Code generation as well as built-in linting and breaking change detection.

The Github page of “Buf”. It is being developed by Buf, which suggests schema-driven development on the above article.
The long-term goal is to enable schema-driven development.
It is developed with the current following challenges.
○ “Defining APIs using an IDL provides a number of benefits over simply exposing JSON/REST services, and today, Protobuf is the most stable, widely-adopted IDL in the industry.However, as it stands, using Protobuf is much more difficult than using JSON as your data transfer format.”

Zap describes itself as a simple cross-platform configuration management and orchestration tool. Store reusable plans in HCL and run a variety of tasks against remote machines.

The GitHub page for “Zap”, a simple cross-platform orchestration and configuration management tool.
Zap’s main goal is to provide a simple mechanism for managing groups of computers with different configurations and needs.
This is achieved with “tasks” that can be configured in “plans” or run standalone. These tasks are a collection of scripts or statically linked binaries that are pushed to the target machine and executed.

Clutch provides a platform for runtime changes to infrastructure. Out of the box it has lots of features, but it’s mainly about making it easy to build custom developer dashboards with extensions.

The Web page of the open source Web UI and API platform “Clutch”. It is designed to simplify, accelerate, and mitigate common debugging, maintenance, and operational tasks.
On the “ What is Clutch? “ Webpage, the following three points are listed as “Goals”. There are other items such as “Features”, “Vision”, “Why Clutch?”, And “FAQ”, so if you are interested, check it.
○ 🧰 Simplify operations.
○ 🕹️ Optimize and integrate the developer experience.
○ 🔧 Complement infrastructure-as-code

SRE Weekly Issue #251 January 3rd, 2021

Articles

Writing Runbook Documentation When You’re An SRE

Tips and tricks for writing effective runbook documentation when you aren’t a technical writer

I like the discussion of the “Curse of Knowledge” cognitive bias.

Taylor Barnett — Transposit

The author found that there are two main reasons that engineers don’t want to write documentation: 1. There isn’t an incentive structure for doing the work, and 2. they are unsure of how to write good documentation.
It explains her suggestion based on the following points.
○ Runbook Templates
○ The Curse of Knowledge
○ SRE Documentation Glossaries
○ Prevent Runbook Search Failure
○ Readable Runbook Steps
○ Code in Runbooks
○ Writing Runbooks Documentation is Hard

SLO — From Nothing to… Production

Here’s one engineer’s SLO journey.

My main focus is on how I educated myself about SLOs and how applied this to my organization.

Ioannis Georgoulas

As the title suggests, the author learned about SLO(Service Level Objectives) in a few months from scratch, and shared the efforts applied to the production environment in the following major items.
○ Prepare yourself
○ Where to start
○ Take ownership and be an SLO advocate
○ Build the framework
○ Summary

How to sell SLOs to Engineering Directors

This blog is a redacted internal memo that aimed to familiarize SLOs with its audience, explain the value of an SLO culture, and describe how we would implement and roll them out.

Thomas Césaré-Herriau — Brex

Since I have covered it in last week’s DEVOPS WEEKLY# 522, so I will skip it.

Why I’ve Been Merging Microservices Back Into The Monolith At InVision

Why would you do this? It’s all about Conway’s Law.

Ben Nadel

Since I have covered it in last week’s DEVOPS WEEKLY# 522, so I will skip it.

Incident Phenomena: Shorthand Names, à la Danny Ocean

The folks at Adaptive Capacity Labs have seen a few patterns crop up over and over in their post-incident reviews. How many of these have you seen before?

John Allspaw — Adaptive Capacity Labs

It explains what it named the patterns observed at the time of incidents.

Home Alone: a Post-Incident Review

Lots of complex contributing factors led to the main character being left behind in the movie Home Alone… so let’s treat it like a production incident!

Fred Hebert

The author wrote an incident investigation the way he would do them for work issues for “Home Alone” as an incident that Kevin was behind, and he had stuck at home to fight burglars.

Making sense of what happened is hard

This one includes a complex timeline showing the interplay of two pairs of bugs, where one in each pair masked the other.

Lorin Hochstein

The author has taken half of Dr. Hannah Harvey’s course, “The Art of Storytelling’’ and is trying to explain obstacles with a focus on oral storytelling. A diagram is used in the article because the content is complicated, but it seems that he would continue to make efforts to improve the technique of oral storytelling. I also want to improve my storytelling skills too.
○ “Now I just need to figure out how to tell this as a story without the benefit of a diagram.”

Outages

Apple iCloud

KubeWeekly #245 January 8th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

2020 CNCF Annual Report

The Cloud Native Computing Foundation (CNCF) annual report for 2020 is now available. The report highlights the growth of the community, events, projects, and more, over the past year.

As CNCF celebrated its fifth birthday in 2020, it achieved greater engagement through membership growth, event attendance growth, increased end user participation, and broad industry commentary.

An introductory article on CNCF’s “2020 CNCF Annual Report”.
The following points are excerpted and explained. The report linked above is easy to read, including figures and graphs.
○ Membership
○ End Users
○ Events
○ Projects

Kubernetes Security Essentials Course Now Available

Today Linux Foundation Training & Certification and the Cloud Native Computing Foundation are announcing the availability of their newest training course, LFS260 — Kubernetes Security Essentials. The course provides skills and knowledge on a broad range of best practices for securing container-based applications and Kubernetes platforms during build, deployment and runtime. It is also a great way to prepare to take the recently launched Certified Kubernetes Security Specialist (CKS) certification exam.

The training course of “Kubernetes Security Essentials (LFS260)” is released. I can now access what I purchased on Cyber Monday to take the CKS exam this year.
It mainly includes slides, Lab Exercises, and Knowledge Check.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Self-hosting Kubernetes is your Raspberry Pi

Alex Ellis, OpenFaas

It explains how to build Kubernetes clusters using Raspberry Pi 4s for self-hosting APIs, websites, and functions and publish them on the Internet to provide traffic to your users.
The author notes that he has created and published many tutorials on containers and clustering on the Raspberry Pi over the last five years, and concludes with the comments below.
○ “Things are still evolving, and support is getting better. If things were rough last time you tried it out, give it another shot.”

The Level Up Hour (S1E20): Kubernetes and Docker Deprecation

Langdon White, Scott McCarty, and Chris Short, Red Hat

Red Hat’s Twitch video series “The Level Up Hour”. It explains the matter in the title.

Switching on the cluster insights using Headlamp

Saiyam Pathak, Civo

An introductory article on Kinvolk’s new open source, easy-to-use and extensible Kubernetes web UI “Headlamp”.

Build a Prometheus Dashboard for K3s with Wio Terminal

Janakiram MSV, The New Stack

A tutorial as the title suggests, using Seeed Studio’s Arduino-compatible microcontroller and a compact device “Wio Terminal” with a 2.4-inch LCD.

Tutorial: Host a Local Podman Image Registry

Jack Wallen, The New Stack

The content of the title is explained with the background and the following points and procedures.
○ Registry vs. Repository vs. Tag
○ Create the Local Registry
○ Push Your First Image to the New Registry

What happens when you create a Pod in Kubernetes? (Video)

Salman Iqbal, Data Science Campus

The content of the title is explained carefully in the YouTube video.

Run Kubernetes Production Environment on EC2 Spot Instances With Zero Downtime: A Complete Guide

Kfir Schneider, Riskified

As the title suggests, it aims to show you how to use AWS EC2 Spot Instances to significantly reduce the cost of your k8s cluster and give you the confidence you need to run on Spot Instances with highly available workloads in production.
I was sensuously reluctant to “use the spot instance in the production environment”, and it was very helpful because I had removed it from my thinking.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

All About Calico

Alex Pollitt, Tigera Saiyam Pathak, Civo

It explained carefully for 70 minutes on YouTube video. It talks from Kubernetes Networking, and gets into Calico and explains network technologies/considerations, Calico’s Vision, and more.

Red Hat OpenShift supports both Windows and Linux containers

Steven J. Vaughan-Nichols, ZDNet

It introduces Red Hat’s latest OpenShift Kubernetes feature, which will be available in early 2021 to run and manage both Linux and Windows containers from a single platform.

Top Considerations when Evaluating an Ingress Controllers for Kubernetes

Harry Tsiligiannis, ReleaseOps

The following points are described as important considerations to conduct ○ the decision-making process to avoid costly mistakes.
○ 1) Traffic protocol support
○ 2) Client management
○ 3) Traffic routing
○ 4) Resiliency
○ 5) Load balancing algorithms
○ 6) Authentication
○ 7) Observability
○ 8) Kubernetes Integration
○ 9) Traffic routing
○ 10) Interface
○ Tip: Use multiple ingress controllers to fill in the gaps
○ Final thoughts

8 Kubernetes insights for 2021

Scott McCarty, Red Hat

As the title suggests, it has insights on the following eight points. It’s better to be numbered to distinguish the numbers in the text.

Infrastructure Engineering — The Kubernetes Way

Vignesh T V, Timecampus

The second article in a two-part series on Kubernetes and its ecosystem. We’re digging deeper into the infrastructure one by one.
Webinar videos that can be used as reference according to the content are embedded in the Web page for reference.

Upcoming CNCF Online Programs

We have expanded our webinar program to Online Programs! Stay tuned for the content release schedule.

The “Upcoming CNCF webinars” section has been changed to “CNCF Online Programs”, and the just the latest Webinar List has been expanded/changed from 2021. Below are some of the points I am interested in.
Overview:
○ Online programs include on-demand, livestream, and live webinars that are offered by members, CNCF incubating and graduated projects, and CNCF Ambassadors.
○ Please note: We are unable to share leads due to the Linux Foundation privacy policy.
Publishing schedule
○ Live webinars are hosted on Tuesdays at 10am PT.
○ On-demand webinars premiere on Wednesdays.
○ Livestreams are hosted on Thursday.
Who can participate in CNCF Online Programs
○ Platinum members are eligible for 4 activities for the year including live webinar, on-demand webinar, livestream, and YouTube playlist submission.
○ Gold members are eligible for 4 activities for the year including on-demand webinar, livestream, and YouTube playlist submission.
○ Silver members are eligible for 2 activities for the year including on-demand webinars and YouTube playlist submission.
○ End user supporter members cannot hold a webinar.
○ CNCF Member webinars will have “member webinar” in the YouTube description.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara