SRE / DevOps / Kubernetes Weekly Collection#41(Week 46)

11 min readNov 16, 2020

In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #515 November 8th, 2020
SRE Weekly Issue #243 November 8th, 2020
KubeWeekly #241 November 14th, 2020

DEVOPS WEEKLY ISSUE #515 November 8th, 2020

News

BPF is already super interesting. With BTF and CO-RE the distribution story gets much easier, with the ability to provide standalone executables that don’t rely on compilers and other tools on the client.

The title is “BPF binaries: BTF, CO-RE, and the future of BPF perf tools”.
BTF and CO-RE are the following abbreviations. Brendan Gregg’s blog post discussing the possibilities and content of these two new technologies.
○ BTF: BPF Type Format
○ CO-RE: BPF Compile-Once Run-Everywhere

Both Go, and more recently Rust, are increasingly popular for infrastructure tooling. This post has a nice comparison of the languages, looking at the main similarities and differences.

The title is “Rust vs Go”.
An article that attempts a friendly and even-handed comparison between Rust and Golang. Focusing on the strengths, suitable use cases, similarities and differences of both, it recommends trying both. The title is easy to understand and has vs, but I felt that it was conscientious and good as a comparison article.

A talk I gave recently about configuration security. The move to infrastructure as code brings with it some interesting security challenges, the slides talk about some patterns to help address.

The title is “Configuration security is a developer problem”.
A slide from a recent presentation by Gareth Rushgrove, the editor of this DEVOPS WEEKLY. The proposals after raising issues for config management from a security perspective are easy to understand.

A quick look at the future of OpenTelemetry and the place of open standards in advancing the state of the art of the observability and monitoring tool.

The title is “Reminiscing control theory and the future of observability”.
I will skip it because it was covered in KubeWeekly last week.

A nice introduction to contract testing, and the problem it solves. Having problems scaling integration tests? Features a Node.js example but it’s applicable to other stacks too.

The title is “Contract Testing for Node.js Microservices with Pact”.
The content is as the title. Both Pact and Contract Test were interesting.
Recently, Kentaro Wakayama ‘s articles have been featured in this newsletter in a row.

A post on some of the challenges with serverless architectures. It mainly makes the case that the disadvantages and challenges are trade offs that you should make for other advantages, which sometimes is going to be true and at other times now.

The title is “Mitigating Serverless Challenges”.
A continuation of last week’s article, “Microservices & Serverless Functions — The difference”.
It describes the challenges developers face when using serverless platforms and how to mitigate them.
When I hear “Catalyst “, I can only think of Cisco’s Catalyst. I confirm that I am a resident of a network field yet.

A nice introduction to using Traefik for canary deployments and weighted load balancing.

The title is “Traefik: canary deployments with weighted load balancing”.
A comparison of Traefik’s weighted load balancing canary releases in Traefik versions 1 and 2.
I have never used Traefik itself, I have only read the article, but versions 1 and 2 were totally different from the configuration image and interesting.

Events

WTF is Cloud Native, and Why Should I Care? Find out the answers to both of these questions in a special webinar hosted by Pini Reznik, co-founder of Container Solutions. Join him 12 November at 13:00 CET. This free, 90-minute event is part of the Bristol tech Festival. Register now.

Continuing from last week, it covered Container Solutions events. As mentioned above, a 90-minute course was held at 11/12 (Thursday) 13:00 CET (Central European Time.

Books

GitOps: What You Need to Know Now,a new e-book by Ian Miell, a Cloud Native engineer at Container Solutions, explains what this workflow is, the problems it was intended to solve, and how it does that. It also compares some common GitOps tools and explores alternatives. Get your free copy here:

There is no “Tools” section this week, and the “Book” introduces the free e-book “Git Ops: What You Need to Know Now” provided by Container Solutions.
Enter your full name and email address and you will be directed to the download page by email. It has 38 pages.

SRE Weekly Issue #243 November 8th, 2020

Articles

Keeping Netflix Reliable Using Prioritized Load Shedding

Sometimes I come across a simple but mind-blowingly awesome new idea. This is one of those times.

During periods of high load and errors, Netflix’s edge load balancer sends feedback to the apps running on users’ devices, adjusting their retry and backoff strategy to keep the service running as smoothly as possible but avoid a thundering herd. Brilliant.

Manuel Correa, Arthur Gonigberg, and Daniel West — Netflix

A tech blog article that introduces how Netflix’s streaming reliability of their service, which is used by many people, is supported.
Building a request taxonomy, prioritized load shedding, and Chaos Testing efforts have made it possible to recover without affecting viewers such as slowing SPS (Stream Per Second).
The articles from this tech blog are covered on a weekly basis here. I recommend this tech blog, which is constantly implementing and disseminating improvement efforts, including Chaos Engineering and the centralized SRE team (CORE). The service is familiar and easy to imagine, and many people are paying attention to it, and this article is also applauded by 1.2K (as of November 14, 2020).

Correlation in Latency Analysis

I helped to invent new approaches to correlate telemetry signals (exemplars, correlation between tracing and logging, profiler labels) that helped our engineers to navigate latency problems faster.

It is her response to Amazon’s writing assessment when she was interviewed by AWS when she was at Google. The content itself was interesting, and I thought, “I should always ask myself this kind of question.”
“What is the most inventive or innovative thing you’ve done? It doesn’t have to be something that’s patented. It could be a process change, product idea, a new metric or customer facing interface — something that was your idea… [retracted]

Scaling Live streaming for millions of viewers

Facebook has two very different users for live streaming: “normal” users and broadcasters streaming sporting events and the like.

Hemal Khatri, Alex Lambert, Jordi Cenzano and Rodrigo Broilo — Facebook

It explains Facebook’s efforts in live streaming.
It was interesting to see some of the efforts, such as the UEFA Champions League final, dealing with different traffic on New Year’s Eve and events, laying new submarine cables, and working with ISPs.

Debugging incidents in Google’s distributed systems

This article covers the outcomes of research performed in 2019 on how engineers at Google debug production issues, including the types of tools, high-level strategies, and low-level tasks that engineers use in varying combinations to debug effectively.

Charisma Chan and Beth Cooper — Google

A Google engineer describes the results of a survey conducted in 2019 to find out how to debug production issues. We are investigating the types of tools engineers use to effectively debug in different combinations, high-level strategies, low-level tasks, and more.

Basic patterns in how adaptive systems fail

The three patterns discussed in this paper are:

● decompensation
● working at cross purposes
● getting stuck in outdated behaviors

David Woods and Matthieu Branlat

A link to Chapter 10 “Basic patterns in how adaptive systems fail” in “Resilience Engineering in Practice”. The three basic patterns explained are as above.

Outages

Gmail
Microsoft 365
Apple iCloud
Netflix
GitHub
Apparently GitHub also had an expired TLS certificate later in the week.
Tabcorp

KubeWeekly # 241 November 14th, 2020

The Headlines

Editor’s pick of the highlights from the past week.

Don’t forget to register for KubeCon + CloudNativeCon North America Virtual 2020!

KubeCon + CloudNativeCon North America 2020 Virtual — THE open source conference of the year — is happening next week, November 17- 20. Have you reserved your spot?

Join us for nearly 200 sessions and the opportunity to hear from the cloud native community. Register now and begin planning your week today!

It will be a few days until the event is finally held. It is also the last to guide you here.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: Developer-friendly platforms with Kubernetes and infrastructure as code

Lee Briggs, Staff Software Engineer @Pulumi

It explains how to build a Kubernetes-based platform using a language that is familiar not only to DevOps operations but also to developers.
Techniques are introduced to help developers automate Kubernetes configuration, management, and deployment tasks and improve their operational knowledge.

CNCF Member webinar: Kubernetes in the context of on-premises edge and network edge computing

Amr Mokhtar, Network Software Engineer @Intel Corporation and Prakash Kartha, Segment Director @Intel Corporation

Introducing OpenNESS (Open Network Edge Services Software), an open source reference cloud native architecture. It provides the following functions.
○ Abstracted platform & network complexity
○ Enhanced dataplane
○ Hardware accelerators management
○ Dynamic discovery & optimal apps/services placement
○ Open integration with Cloud Native Functions (CNFs)

CNCF Member webinar: MicroK8s HA under the hood: Kubernetes with Dqlite

Konstantinos Tsakalozos, Senior Software Engineer @Canonical

Thematically, it explains how Canonical’s team distributed the most widely used databases (SQLite) on the planet, and how automated devops for such distributed databases provide seamless HA.

CNCF Project webinar: What’s new in Linkerd 2.9: mTLS for all TCP connections, ARM support, and more

Oliver Gould, Linkerd creator and CTO @Buoyant

Oliver Gould, the creator of Linkerd, describes Linerd version 2.9 with the following points:
○ Linkerd performs encryption and authentication to the pod boundary, providing “encryption in transit” in a modern, zero-trust form.
○ The new multi-core proxy runtime further improves performance over Linkerd’s already lightning-fast latency profile
○ Linkerd’s new service topology support can provide significant performance improvements and cost savings for Kubernetes applications
○ What the future of Linkerd holds!

CNCF Member webinar: DevOps from a different data-set: what 11 million workflows reveal about high performing teams

Mike Stahnke, VP of Platform @CircleCI and Ron Powell, Technical Content Manager @CircleCI

It takes a view of anonymized team data from millions of DevOps workflows and shares insights, behaviors, and metrics to help teams build better software faster.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Geographically Distributed Stateful Workloads Part One: Cluster Preparation

Raffaele Brusholi, Red Hat

It explains how to deploy stateful apps in three cloud regions with near zero RTOs and RPOs.

powerfulseal/powerfulseal: A powerful testing tool for Kubernetes clusters

The GitHub page of Powerful Seal, a tool for chaos engineering that injects failures into Kubernetes clusters and detects problems as soon as possible.

Create your first Knative app

Jessica Cherry, opensource.com

A tutorial to run the app using Knative and Minikube.

How to use Docker Security Scan Locally

Brian Christner

Introducing “Docker Scan,” which is realized by a partnership between Docker and Snyk that scans containers for vulnerabilities in a local environment.
If you refer to the built image and Dockerfile, it will also show which layer of Dockerfile is vulnerable.

Seccomp for Fun and Profit

Jim Ramsay, Red Hat

A challenge and guide whether it is possible to limit some of the privileges of NET_ADMIN with seccomp.

metal3-io / baremetal-operator: Bare metal host provisioning integration for Kubernetes

I will skip it because it was covered in the same part of last week’s article.

Low-budget self-hosted Kubernetes

Tobias Huebner

At the beginning, it says that “Kubernetes doesn’t need to be something you only use on insanely large projects. It provides countless benefits even for small enterprises”.
The goal of this series is to give you a hands on tutorial on setting up your own cluster, and everything you need to truly make it functional. It is published in four parts.

Platforms on k8s with Golang — Watch any CRD

Ryan Dawson, Hackernoon

If you want to do more with Kubernetes than run off-the-shelf apps, it explains that Golang gives you a lot more flexibility in interacting with Kubernetes.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Announcing Linkerd 2.9: mTLS for all, ARM support, and more!

William Morgan, Linkerd

Similar to “CNCF Project webinar: What’s new in Linkerd 2.9: mTLS for all TCP connections, ARM support, and more” in “ICYMI: CNCF Webinars” , it describes the updates and future of Linkered version 2.9.

Linkerd, with Thomas Rampelberg

Adam Glick and Craig Box, Kubernetes Podcast from Google

Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
A Software engineer at Buoyant , Linkerd’s creator and core maintainer, Service Mesh Interface co-author, and DC/OS co-creator Thomas Rampelberg welcomed as a guest.
The topics I was interested in in the News of the week are as follows.
○ Helm chart deprecation
● Episode 11, with Vic Iglesias
○ CyberArk looks at threats to Kubernetes

Episode 42 — Veterans Day Special with Red Hat’s Chris Short and Marky Jackson

Dan Papandrea (Sysdig), Chris Short (Red Hat), Marky Jackson (Equinix Metal)

The above members appear in a podcast with the theme “Veterans”. It was interesting because there are few opportunities to contact with this kind of theme in Japan.

Chaos Experiments on Kubernetes using Litmus to ensure your cluster is production ready

Saiyam Pathak, Civo

A tutorial that installs the open source tool “Litmus” for performing Chaos Experiments on Kubernetes clusters on Kubernetes clusters, and creates/executes the following experiments.
○Pod Deletion
○ Pod Autoscaler

China’s government-anointed Git operator says it will become a Linux Foundation mirror

Simon Sharwood, The Register

A Chinese Git-as-a-service outfit named “Gitee” has dealt with the Linux Foundation to mirror the Linux Foundation’s projects behind the Great Firewall. Currently, there are only two projects, edge computing “Baetyl” and IoT edge computing framework “EdgeX Foundry”, but Gitee says that all projects will be mirrored in the future. It seems that the Linux Foundation also acknowledges as follows.
The Linux Foundation has confirmed the new relationship.

CNCF Releases Free Training Course Covering Basics of Service Mesh with Linkerd

CNCF

Introducing a new training course for Service Meshes using Linkerd, “Introduction to Service Mesh with Linkerd” by CNCF and the Linux Foundation.
Anyone can attend for free. If you need a certificate of completion, you can upgrade for $ 149.
For SREs, DevOps professionals, cluster administrators, and developers who want to know more about service meshes and Linkerd.
KubeWeekly puts a lot of topics and pushes Linkerd forward this week.

What Will It Take to Shift Kubernetes Security Left?

Bill Doerrfeld, Container Journal

It says in the first part that “For experts in the Kubernetes arena, securing the platform involves shifting security left. In other words, security forethought must come earlier on in the development journey, with policy-driven automation in place to detect potential issues. ”
He explains the necessity to respond by combining the following four.
○ Policies
○ Integrative Security Tooling
○ Developer Experience
○ Great Error Messages

Amazon Web Services will build its own public registry for Docker container images

Mike Wheatley, SiliconANGLE

It describes a new public container registry announced by AWS as a countermeasure for the Docker Hub image pull rate limit that started on November 2.

vSphere 7 with Tanzu Integrates with HAProxy for Load Balancing Enterprise-grade Kubernetes

HAProxy

As the title suggests, VMware has partnered with HAProxy Technologies to integrate HAProxy load balancer into vSphere 7 and adopt HAProxy as the default load balancer for Tanzu Kubernetes clusters.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Discover, analyze, and secure your APIs…anywhere
Pranav Dharwadkar, VP of Products @Volterra.io
Jakub Pavlik, Director of Engineering @Volterra.io
Dec 1, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: A look at how hackers exploit Prometheus, Grafana, Fluentd, Jaeger & more
Omer Levi Hevroni, Application Security Engineer @Synk
Dec 8, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!

Yoshiki Fujiwara