SRE / DevOps / Kubernetes Weekly Collection#33(Week 38)

14 min readSep 24, 2020

In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #507 September 6th, 2020
SRE Weekly Issue #235 September 13th, 2020
KubeWeekly # 233 September 18th, 2020

DEVOPS WEEKLY ISSUE #507 September 13th, 2020

env0 sponsors Devops Weekly

Setting up a CD pipeline in 5 minutes? It’s actually doable these days with off the shelf tools. Check out how

About env0: use Terraform to let your team manage their own environments in AWS, Azure and Google. Governed by your policies and with complete visibility & cost management.

The title is “Making Continuous Deployment of Terraform code easier with env 0”.
While setting up your CD pipeline can seem complicated, with a new generation of tools it’s actually gotten a lot more simple. “ it says, using env0 to show how to set up a simple CD pipeline.
Here are three things you need to try for yourself.
○ An env0 account (It’s free! Just join here)
○ A GitHub account
○ An AWS account

News

A post on how to build an SRE team, from other roles to find candidates in to how you might choose to structure the team to begin with.

The title is “How to Build Your SRE Team”.
Since it is covered in last week’s SRE Weekly Issue # 234 , so I will skip it.

A new podcast series focused on chaos engineering, with several discussions already on the psychology of chaos engineering and learning from incidents.

A new podcast series sponsored by the Chaos Community, a community of chaos engineering practitioners.
It has already been released to Episode 3, with guests of each episode, Norah Jones, Russ Miles, and Julie Gunderson(at that moment).

An introduction to the operator pattern in Kubernetes, and a look at the Operator Framework SDK for creating Go operators.

The title is “Kubernetes Operators 101”.
It describes what Kubernetes Operators are and how they are made up of two basic and powerful parts: Kubernetes Custom Resources and Controllers.

A good list of best practices for building Docker images for Node.js projects. While some are generally applicable, several are specific to Node.

The title is “Docker best practices with Node.js”.
A comprehensive list of Docker best practices that are exemplified under the realm of Node.js.
Each bullet has detailed information and links to code examples. The entire list can be found in Node.js Best Practices.

A discussion of the importance of flow, and approaches to building empowered teams that can deliver products end-to-end.

The title is “Product Teams Need a Family Too! @ New Ways of Working — Modern Agile in Wellington meetup, Sep 2020”.
A presentation slide at the above event published on SlideShare. Two co-authors of “Team Topologies” are giving a presentation citing the essence of the book.

An interesting community attempt to standardize how to distribute WASM applications as OCI images. I’d like to see this conversation happen under the auspices of the OCI but interesting stuff.

The title is “Announcing the WebAssembly (wasm) OCI Image Spec”.
An introductory article of “WebAssembly (Wasm) OCI Image Specification” by Solo.io.

Tools

An online GUI tool for quickly creating Kubernetes YAML configuration files. Select the workload type and then go through the (long) list of available options.

The GitHub page of “oso”, an open source policy engine for authorization built into your application.
Since it is covered in Kube Weekly # 232 last week. I will skip it.
It’s an educational UI and I personally highly recommend it, so check it out.

Whispers is a new tool for identifying secrets (AWS tokens, hashed credentials, sensitive files, etc.) in source code and various configuration files.

The GitHub page of Whispers, a static code analysis tool that analyzes a variety of common data formats to search for hardcoded credentials and dangerous functions.
It can run in the CLI or you can integrate it in your CI/CD pipeline.

Polaris is an interesting looking audit tool for Kubernetes, showing a high-level health score for clusters and workloads and providing a list of issues to fix.

The GitHub page of Polaris, an open source project that identifies misconfigurations in Kubernetes deployments developed by Fairwinds.
It makes sure your Kubernetes pods and controllers are configured using best practices to avoid future problems.
It can be executed in the following three different modes.
○ As a dashboard, so you can audit what’s running inside your cluster.
○ As a validating webhook, so you can automatically reject workloads that don’t adhere to your organization’s policies.
○ As a command-line tool, so you can test local YAML files, e.g. as part of a CI/CD process.

SRE Weekly Issue #235 September 13th, 2020

Articles

Alerting on SLOs

This isn’t just another boring article about SLOs. There’s a ton of good stuff in here about why they moved to SLO-based alerts, too.

we’re hoping that by implementing SLOs — and alerting on them — we’ll be able to improve communication during incidents, reduce the toil on on-callers, and help improve our reliability in a way that’s meaningful to our users.

Mads Hartmann

An article about a project where Glitch has moved to SLO-based alerts.
It is based on two internal documents of Glitch.
○ The first is the technical specification used to discuss whether the approach is appropriate for the parties.
○ The second is a presentation where the author introduced SLO-based alerts to the platform team.
TL; DR has a certain volume.

A nudge in the right direction

Often, serendipity gets us out of an incident or makes it less severe.

Unless we treat this sort of activity as first class when looking at incidents, we won’t really understand how it can be that some incidents get resolved so quickly and some take much longer.

Lorin Hochstein

An article in which the author encountered operational “surprise” and talked about the nudging in the right direction voluntarily carried out in horizontal cooperation and its importance.
It writes a short concrete example of sharing knowledge using Slack.

Seamlessly Swapping the API backend of the Netflix Android app

It’s your classic “replace the engines on a jet while flying it” story. My favorite part is how they recorded real traffic and played it at the old and new backend API to compare the JSON responses.

Rohan Dhruva and Ed Ballot — Netflix

An article about a recently completed one-year migration project by Netflix’s Android team.
The backend has been redesigned and separated from the centralized model. It describes the approach to this transition, the strategies adopted, and the tools built to support it.
It is a joint effort of multiple back-end teams and front-end teams, and it is worth reading.

Using feature flags during incident management

Feature flags can help with load shedding and throttling, and feature flag activity can even be useful data that points to contributing factors.

Dawn Parzych — LaunchDarkly

It confirms from the definition of the incident, takes up the impact on the person who is responding to the incident, the problem of incident resolution, and explains how to improve the incident management using the Feature flag along the following five points.

Kill switches or circuit breakers
Throttling
Reducing MTTR
Automation
Update your processes and runbooks

Unimog — Cloudflare’s edge load balancer

Unimog uses a lot of really interesting techniques to balance layer 4 traffic, about which this article goes into in great detail.

David Wragg — Cloudflare

A commentary article on Cloudflare’s L4 load balancer “Unimog”.
About two years ago, I realized that existing solutions for balancing the load in my data center couldn’t meet their needs, so they started a project to deploy Unimog. Unimog has been deployed in production for over a year.
Unimog is based on the techniques used by other L4 load balancers, but has many implementation details tailored to the needs of the edge network. And since this article is a masterpiece, I can’t read it quickly. I will read it again.

Production testing with dark canaries

I like this idea: it’s like a normal canary, except that you only send it a copy of traffic and discard the result, so as to avoid impacting users.

David Hoa — LinkedIn

It introduces “dark canary clusters” as a way to detect problems before going into production.

Outages

KubeWeekly # 233 September 18th

The Headlines

Editor’s pick of the highlights from the past week.

TiKV, TiDB and PingCAP, with Ed Huang

Congratulations to the TiKV team for graduating in the CNCF. How much do you know about this key-value store, and the TiDB SQL layer built atop it? Learn about its history in this Kubernetes Podcast from Google interview with PingCAP co-founder Ed Huang.

Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
They welcome Ed Huang, PingCAP’s co-founder and CTO, and creator of TiDB distributed database and TiKV KVS.
The topics I was interested in in the News of the week are as follows. There was a lot of news that I didn’t mention here because there were other places in this article and known ones.
○ Lens 3.6.0
○ Introducing Nutanix Platform Services by Amit Jain

TOC approves KubeEdge as incubating project

KubeEdge is an open source system for extending containerized application orchestration capabilities to hosts at the edge. It is built on top of Kubernetes and provides infrastructure support for network, application deployment, and metadata synchronization between the cloud and the edge. This brings the current number of incubating projects to 21.

An article by CNCF that KubeEdge has been approved by the CNCF TOC(Technical Oversight Committee) as an Incubating project.
KubeEdge entered the CNCF Sandbox project in March 2019 and released version 1.0 in June 2019. The KubeEdge team releases a new version every quarter, in line with upstream Kubernetes.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: Declaratively managing apps in a multi-cluster world

Fernando Ripoll, Solution Engineer @Giant Swarm

It describes the open source tool app-operator (and its sibling chart-operator). They are Helm-based and provide platform teams with a declarative way to manage their apps in a cluster team.
It explains why abstraction in the form of a simple CRD makes sense and how it can be used for a variety of use cases.

CNCF Member Webinar: Effective Kubernetes onboarding

Jamon Camisso, Developer Educator, Community @DigitalOcean

The following two points are explained according to the theme. DO = Digital Ocean.

Core concepts and insights from DO’s new Kubernetes for Full-Stack Developers curriculum
An on-the-ground perspective taken from the Community Platform team’s migration from a traditional VM environment to Kubernetes.

The following two points are cited as important points.

How to onboard teams successfully by establishing knowledge baselines and effectively organizing, structuring, and delivering Kubernetes concepts
How to situate Kubernetes in a larger arc of application development and integrate it into an existing development workflow

CNCF Member Webinar: How to run Kubernetes securely and efficiently

Joe Pelletier, VP, Products Fairwinds @Fairwinds & Robert Brennan, Director, Open Source @Fairwinds

It describes how to run Kubernetes safely and efficiently on a large scale. It also describes open source tools that engineering leaders can use successfully with Kubernetes to improve application reliability, security, and scalability.

CNCF Ambassador Webinar: Hybrid serverless development using Quarkus and Kubernetes

Daniel Oh, Principal Technical Marketing Manager @RedHat and CNCF Ambassador

Quarkus, Kubernetes Native Java Stack, explains how developers can create traditional microservices and deploy features to multiple serverless platforms (Knative + Kubernetes, Amazon Lambda, etc.).

CNCF Member Webinar: Achieving least privilege access in Kubernetes

Eran Leib, Co-Founder and VP Product Management @Apolicy & Daniel Pacak, Open Source Engineer @Aqua Security

It answers frequently asked questions from experts addressing security and compliance challenges in the dynamic environment of Kubernetes:
What is RBAC?
How access in Kubernetes works?
How to define and enforce access policies?
Can roles only be assigned the access level needed to do their job?

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

5 tips for developing Kubernetes operators with the new Operator SDK

Laurent Broudoux, Red Hat

It mainly introduces the following five tips that make it easier to develop Operators using the newly released Kubernetes Operator SDK 1.0.0.

Handling default CRD values
Preparing your Operator for OpenShift
Discovering the cluster you’re running on
Using extensions APIs in Go-based Operators
Adjusting Operator resource consumption

Let’s learn Kubermatic Kubernetes Platform

Saiyam Pathak, Civo

The Kubermatic Kubernetes Platform and its concepts are explained in detail in slides and the demo below.
○ Demo 1 — Dashboard overview and Kubernetes cluster on Google cloud
○ Demo 2 — Kubernetes cluster install on AWS via KKP
○ Demo 3 — Cluster Upgrade via KKP
○ Demo 4 — Resource on KKP platform and some dashboards

Terrascan extends policy as code to Kubernetes

Jon Jarboe, Accurics

An article that introduces an OSS tool detects compliance and security violations across Infrastructure as Code, “Terrascan” version 1.1.0 supports the Kubernetes in.
Terrescan is an extensible tool that allows teams to detect compliance and security breaches across their infrastructure as code and mitigate risk before provisioning cloud-native infrastructure.
Future releases will add support for k8s infrastructure managed through other IaC providers such as Terraform.

An introduction to Kubespray

Shiwani Biradar, Enable Sysadmin

An article that introduces the tool “Kubespray” for deploying Kubernetes clusters.
Before it talks about kubespray, take a look at Kubernetes, including its function and features.

Cruster: Easily create and manage Kubernetes clusters on Raspberry Pis

Zane Hitchcox

The io page of “Cruster”, a tool that allows you to create a Kubernetes cluster on a Raspberry Pi without connecting the Raspberry Pi to a keyboard or monitor. There is a demo video of about 4 minutes.

Easier troubleshooting of cert-manager certificates

Haoxiang Zhou, Jetstack

It describes kubectl cert-manager status certificatecommands that are the latest addition to the kubectl plugin for cert-manager.
This command was designed to facilitate troubleshooting cert-manager issues and has been significantly improved in recent v1 releases.

High-availability PostgreSQL and more on OpenShift

Jonathan Katz, Crunchy Data and Diane Mueller, Red Hat

A Webinar video by Red Hat’s OpenShift team. As a guest speaker, Jonathan Katz (VP Platform Engineering at Crunchy Data) is invited to explain the theme.

Performing a live CNI migration

Josh Van Leeuwen Jetstack

It explains why you need to change CNI, what he’s learned in developing a live migration solution, and how everything works.

pgnodemx

pgnodemx provides a set of functions facilitating the collection of host-node level operational metrics. In particular, pgnodemx has support for capturing metrics related to cgroups (one of the Linux kernel facilities underlying the various Container technologies) and the Kubernetes Downward API.

Crunchy Data’s “pgnodemx” public repository on GitHub.
“pgnodemx” is a PostgreSQL extension that provides SQL functions that can capture node OS metrics via SQL queries.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Managed Kubernetes services compared: GKE vs. EKS vs. AKS

Bharat Arimilli

From the author’s point of view, he takes a look at Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS) and compares their features with the overall experience.
As he mentions in the note below, the information may already be out of date. Also, since the viewpoints of comparison and the circumstances for selecting technology are different, I think it is best to read these articles with a critical eye as a reference only. If there are any clear mistakes or biases in the content, I think it would be good to give direct feedback to the author.
○ Note: These services tend to evolve very quickly, so some of these details may be outdated by the time you read them.

Tech Breakfast Podcast, with Chris Short

Aaron Buley and Tyler Gates

Chris Short, the editor of Kube Weekly, has participated in the podcast “Tech Breakfast Podcast” as a guest. He talks about the Kubernetes release team, NVIDIA’s acquisition of Arm, and more.

Must-read free Kubernetes books

Scholar Ibryam, Red Hat

An article that introduces Kubernetes-related must-read books that are provided free of charge. It is wonderful that this content is open to the public for free. All you have to do is make each need and time and read it.

Crossplane: A Kubernetes control plane to roll your own PaaS

Joab Jackson, The New Stack

An introductory article on the podcast “The New Stack Context” with Upbound’s Phil Prasek (principal product manager) as a guest. The page has a podcast embedded in it.

Confidential GKE nodes now in beta

Sunil Potti and Eyal Manor, Google Cloud

An article announced its expansion of the Google Cloud Confidential Computing portfolio and the company’s vision with the following two announcements.
○ First, Confidential GKE Nodes, the second product in our confidential computing portfolio, will soon be available in beta, starting with the GKE 1.18 release. This gives organizations additional options for confidential workloads when they want to utilize Kubernetes clusters with Google Kubernetes Engine (GKE).
○ We’re also making Confidential VMs generally available. This capability will be available to all Google Cloud customers in the coming weeks and will include new features we’ve added during beta.
A high-energy video introducing “Confidential Computing” is embedded.

Achieving multi-tenancy in monitoring with Prometheus & the mighty Thanos Receiver

Sayan Das, InfraCloud

As a component of Thanos, it introduces along the title that “Receiver”, which was the status of EXPERIMENTAL until now, has become GA.
Thanos Receiver is a component designed for multi-tenant implementation, which is one of the common challenges of distributed monitoring.
In the next post, it seems that this architecture will be improved to implement tenant separation in thanos-querier.

Member Webinar: Using KubeVirt in telcos
Abhinivesh Jain, Distinguished Member of Technical Staff @Wipro
Sept 23, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Mitigating Kubernetes attacks
Wei Lien Dang, Head of Strategy @StackRox
Sept 23, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: AWS controllers for Kubernetes — AWS services, now Kubified!
Jay Pipes, Principal Open Source Engineer @Amazon Web Services
Sept 24, 2020 10:00 AM Pacific Time
REGISTER NOW »

Project Webinar: Kubernetes 1.19
Kubernetes Release Team
Sept 25, 2020 8:00 AM Pacific Time
REGISTER NOW »

Ambassador Webinar: The evolution of Ingress through the Gateway API
Kaslin Fields, Developer Advocate @Google
Bowei Du
Sept 25, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: VanillaStack as a platform for a truly vendor-agnostic open-source ecosystem
Karsten Samaschke, CEO @Cloudical
Sept 29, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Self service Kubernetes for enterprises
Jim Bugwadia, Founder and CEO @Nirmata
Sept 30, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Dapr, Lego for microservices
Mark Chmarny, Principal Program Manager @Microsoft
Oct 1, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Transactional microservices — The final frontier
Daniel Kozlowski, Minister of Engineering @PlanetScale
Oct 2, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Multi-Cluster & multi-cloud service mesh with CNCF’s Kuma and Envoy
Marco Palladino, CTO & Co-Founder @Kong
Oct 6, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: The evolution of cloud orchestration systems from ephemeral to persistent storage
Boyan Krosnov, CPO @StorPool
Oct 7, 2020 8:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Kubernetes native two-level resource management for AI/ML workloads
Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research
Oct 7, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Building dynamic machine learning pipelines with KubeDirector
Tom Phelan, Fellow, Software Organization @Hewlett Packard Enterprise
Oct 8, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: ephemeral.run: A full application environment for every PR–before you merge to master!
Vishal Biyani, CTO @InfraCloud
Jono Spiro, Staff Software Engineer, Engineering Operations @OpenGov
Oct 14, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer @Lenses.io
Francisco Perez, Senior Backend Engineer @Lenses.io
Oct 14, 2020 1:00 PM Pacific Time
REGISTER NOW »

How was it? Did you find any articles or information that interest you?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara