SRE / DevOps / Kubernetes Weekly Collection#40(Week 45)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #514 November 1st, 2020
SRE Weekly Issue #242 November 1st, 2020
KubeWeekly #240 November 6th, 2020

DEVOPS WEEKLY ISSUE #514 November 1st, 2020

A fun set of 10 short stories highlighting the reality to production incidents.

Digital transformation is increasingly a strategic priority for all large organisations. That means it’s important for business executives to be familiar with the need to modernise applications and platforms.

  • The title is “App Modernization 101: An Executive’s Guide to Shipping Better Software”.
  • As the title suggests, this is an article for Executives. The article is closed with the following recommendations. Links to reference information are listed at the end.
    ○ Home in on the business capability you’re trying to build
    ○ Map the value streams of getting applications to production
    ○ Start thinking about modernization as an ongoing affair

Pulumi allows for defining infrastructure using general purpose programming languages. With the new automation API, it’s now possible to embed this capability in other programs, with initial support for Typescript and Go.

  • The title is “The Pulumi Automation API — The Next Quantum Leap in IaC”.
  • It announces the Pulumi Automation API, a robust program layer on top of Pulumi’s declarative infrastructure.
  • The Automation API can be used to fully integrate Pulumi into software projects and enhance various custom cloud infrastructure automation projects. There is no CLI, no human-in-the-loop, just code.

A quick case study of operating a large multi-tenant Kubernetes cluster in the public cloud. Covers provisioning, management, visibility and more important operations challenges.

  • The title is “How Salesforce Operates Kubernetes Multi-tenant Clusters in Public Cloud at Scale”.
  • Salesforce, which took a very early bet on Kubernetes in 2015 to move from monoliths to microservices, shares some of the challenges it faces and the solutions they have identified.

The term cloud native has become increasingly prevalent. This post talks about why, and breaks down several tooling areas to focus on.

  • The title is “How to Become Cloud Native — And the Tools to Get You There”.
  • The content is an explanation of the content as the title says. I think there are many items in “Cloud Native Tools”, but I feel that there are more, so the road to Cloud Native is long and winding for me.

Both microservice and serverless architectures push for smaller units of execution, this post looks at the differences between the two.

  • The title is “Microservices & Serverless Functions — The difference”.
  • As the title suggests, this also explains the difference between the two concepts, and aims to enable readers to select the correct technology stack according to their application.

WTF Are Microservices? Join Sam Newman, author of Monolith to Microservices, on 5 November at 11:30 CET for a 90-minute crash course in microservices architecture: WTF it is, but also when you should and shouldn’t use it. Register now

  • As mentioned above, a 90-minute course will be held. 11/5 (Thursday) 11:30 CET (Central European Time zone).

WTF Is Cloud Native? It’s blogs, videos, events, and more, about an ever-changing world of strategy, culture, technology, and more, brought to you by Container Solutions. Let’s f*#king do this! Subscribe to the newsletter.

  • An information page for blogs, videos, events, and newsletters about Cloud Native of Container Solutions, which provides the above “WTF Are Microservices?”.

Earthly is an interesting new build tool focused on repeatable builds. It combines Dockerfile and Make and makes it easier to run isolated tests and other commands.

  • The web page of the build automation tool in the container era, “Earthly”. You can run all builds in a container and create Docker images and artifacts(binaries, packages, arbitrary files, etc.).
  • Click here for the GitHub page.

Tempo is an easy-to-use and high-scale distributed tracing backend. Tempo is integrated with cloud-based object storage and can be used with a variety of tracing protocols, including Jaeger, Zipkin and OpenTelemetry.

  • The GitHub page of OSS’s large distributed trace backend “Grafana Tempo”.
  • It’s cost-effective, requires only object storage to operate, and is tightly integrated with Grafana, Prometheus, and Loki.
  • Compatible with Jaeger, Zipkin, OpenCensus, and OpenTelemetry, it imports batches in these formats, buffers them, and then writes to GCS, S3, or a local disk.

Ripgrep is a local code search tool that’s optimised for performance and nicely integrates with other developer tools like gitignore files.

  • The GitHub page of the line-oriented search tool “Ripgrep” that recursively searches for regular expression patterns in the working directory.

SRE Weekly Issue #242 November 1st, 2020

Here are 4 Ways SRE Helps New Employees Onboard

The work of SREs and the material we produce can be an excellent source of information to onboard new employees (not just SREs!).

Author Emily Arnot — Blameless

  • The content of the title is explained as “The SRE mentality can provide insights into many areas, including onboarding itself. “, and how SRE can take onboarding to the next level is explained with the following four points. doing.
    ○ Runbooks as guides for new employees
    ○ Incident retrospectives as a library of learning
    ○ SLIs, SLOs, and error budgets as focal points and confidence boosters
    ○ Refining onboarding with an SRE mentality

Sharp tools for emergencies and the –clowntown flag

Having safeguards in your tools to prevent errors, is wise. Allowing the user to disable those safeguards when the need arises is even wiser.

Rachel by the bay

  • Spotlighting the Facebook internal term “clown town” or “clown town” as an example, it explains how to prepare sharp tools for emergencies.

United States Air Force Aircraft Accident Investigation Board Report — F-35A, T/N 12–005053

Lots of factors contributed to the crash and destruction of this $175 million USD aircraft. The pilot escaped with minor injuries.

Colonel Bryan T. Callahan et al. — USAF

  • U.S. Air Force fighter accident report. When I saw the list of “ACRONYMS AND ABBREVIATIONS”, I thought “There are so many!”

The Future of Ops Careers

Serverless isn’t going to make ops go away. NoOps is a myth.

Charity Majors — Honeycomb

  • At the beginning, it writes, “Even if you don’t run any servers or have any infrastructure of your own, you’ll still have to deal with operability and operations engineering problems. I hate to be the bearer of bad news (not really), but the role of operations isn’t going away. At best, the shifts that supposedly reduce your ops are simply delegating the operability of your stack to someone that does it better. The reality for most teams is that operations engineering is more necessary than ever.” and explains about the future of Ops.

The KPIs of improved reliability

In this blog post, we’ll present reliability-centric metrics and key performance indicators (KPIs) that show the positive impact that reliability has on businesses.

Andre Newman — Gremlin

  • It introduces reliability-centric indicators and KPIs that show the positive impact of reliability on business.
  • It’s a Gremlin blog, off course it links to chaos engineering.

The failure of a computer you didn’t even know existed

“Outage of a CRL server” isn’t the first thing that would come to mind when diagnosing a database connection failure.

Oren Eini — RavenDB

  • It explains the event that his blog went down and asked for opinions.
  • When I thought it was a title I had seen, it referred to Leslie Lamport’s words.
    ○ “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”.

Telltale: Netflix Application Monitoring Simplified

Telltale combines anomaly detection, alerting, dashboarding, and incident management.

Andrei Ushakov, Seth Katz, Janak Ramachandran, Jeff Butsch, Peter Lau, Ram Vaithilingam, and Greg Burrell — Netflix

  • Netflix’s in-house application monitoring tool “Telltale” is explained from the specific background that was needed.
  • Someone asked this article, but unfortunately there is a long way for being open sourced. The following is the content of the reply.
    Unfortunately, we’re not planning to open source Telltale anytime soon. Right now it’s too Netflix-specific internally. We are starting to think about what would it take to make it more abstract and pluggable but thinking is still a long way from being open-source ready.

File Descriptor Transfer over Unix Domain Sockets

What?! I had no idea this was possible! You can transfer file descriptors (and the open files they point to) to another process, even outside of the normal parent/child process relationship.

Cindy Sridharan

  • The author explains the impact and content of reading a paper.
  • The paper for how Facebook’s uninterrupted release of services that speak different protocols and handle different types of requests (long lived TCP / UDP sessions, requests involving huge chunks of data etc.)
  • It seems to be the most interesting article I’ve seen this week, but it doesn’t come to my mind at all, so I asked an expert for some points. It looks interesting, so I’ll read it back later.
  • GeoComply
    GeoComply, a geo-location service used by most online gaming sites in the US to monitor the physical location of their customers, experienced a major outage.
  • Coinbase
  • Twitter

Failure information of each of the above companies

KubeWeekly # 240

Editor’s pick of the highlights from the past week.

Honoring Dan Kohn

This weekend, we lost a titan of the open source community with the passing of Dan Kohn. CNCF, the foundation Dan helped build as its Executive Director, will always be home to Dan’s legacy as a pioneer and innovator in the world of technology. As a community, we remain humbled and grateful to the tireless effort Dan gave to this foundation, his colleagues, and his friends. His work in creating an inclusive foundation that was welcoming and safe was momentous and beneficial to all. The strong and diverse leadership we experience today stems from Dan’s determination. Dan was unwavering in his passion for and belief in open source. His presence will be severely missed, but never forgotten by those who knew his gentle nature and felt his supportive touch. Our thoughts and prayers remain with the Kohn family, who so gracefully shared Dan’s light with us for so many years. While it’s almost impossible to imagine CNCF without Dan, we know there would never be a CNCF without him, either, and for that, we are truly thankful. Thank you, Dan.

  • An article by the CNCF in honor of Dan Kohn, who has contributed significantly to the foundation and development of the CNCF as an Executive Director, and passed away.
  • If you would like to leave a memorial message to him, please send a pull request to this GitHub page.

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: Security in the world of service meshes

John A. Joyce, Principal Engineer @Cisco

  • It provides an overview of security in the world of service meshes, starting with an introduction to key security concepts and describing the key system components that implement those concepts.
  • There is a demo that combines several CNCF projects. It mentions Envoy, Linkerd, Spiffe, Spire, and Network Service Mesh.

CNCF Member webinar: Managing your policies and standards

Ahmed Badran, Chief Technology Officer @Magalix

  • It explains the following points and gives a real-world example of implementing a simple governance framework using Rego and OPA.
    ○ What is governance and why it is important
    ○ How to establish a governance framework
    ○ How Open Policy Agent and the Rego language could help
    ○ Example policies for Kubernetes

CNCF Member webinar: Building edge as a service

Dr. Bin Ni, CTO @Wangsu Science & Technology / CDNetworks

  • It shares a concept model that can efficiently achieve the goal “establishment of a standard method of providing edge computing to developers as a service”.

Tutorials, tools, and more that take you on a deep dive into the code.

Ensuring YAML best practices using KubeLinter

Saiyam Pathak, Civo

Set up your K3s cluster for high availability on DigitalOcean

Alex Ellis, OpenFaas

  • It provides an overview of the reference architecture for setting up K3 in a high availability (HA) configuration.

metal3-io / baremetal-operator : Bare metal host provisioning integration for Kubernetes

  • The GitHub page of “Bare Metal Operator” that implements the Kubernetes API for managing bare metal hosts.
  • You can maintain the inventory of hosts that can be used as instances of BareMetalHost CRD(Custom Resource Definition) and can execute the following.
    ○ Inspect the host’s hardware details and report them on the corresponding BareMetalHost. This includes information about CPUs, RAM, disks, NICs, and more.
    ○ Provision hosts with a desired image
    ○ Clean a host’s disk contents before or after provisioning.

Using WireGuard to extend OpenShift networks

Sebastian Jug, Red Hat

  • Red Hat’s PSAP (Performance Sensitive Applications) team presents its work on titles in collaboration with WireGuard.

Security hardening Kubernetes


  • YouTube Webinar video by Johan Tordsson, CTO of Elastisys.
  • At the beginning, it explained that “Today what is needed most is guidance on what exists and how best to use the right resources to meet the security and compliance requirements while still benefiting from the speed and agility Cloud Native environments offer”.
  • It provides a deep dive on security development tools and open source Kubernetes services available to meet these growing needs.

The road to Flux v2 — November update

Daniel Holbach, Weaveworks

  • At the beginning, the document and the embedded video are introduced for the reader.
    ○ If you are new to the community and GitOps, you might want to check out our GitOps manifesto or the official GitOps FAQ.
    ○ If you want to see the latest demo of GitOps Toolkit in action, check out this video:
  • The content is exactly as the title says, but there is more content than you can imagine in the “November Update”. The difference between v2 and v1 and the policy of future maintenance are written, so those who are involved should check it.

CI/CD with Chris Short (2/2) — YouTube

  • A YouTube video featuring Chris Short, one of Kube Weekly’s editors. Episode 216 of the YouTube channel “Roaring Elephant”.

How to use skopeo to migrate off Docker Hub

JJ Asghar

  • It introduces you how to migrate from Docker Hub to or GitHub Container Registry using “skopeo” provided by Red Hat.

Oracle continues building DTrace for Linux atop BPF


  • Along with the title, it explains the past and latest movements of DTrace for Linux by Oracle.

Disposable Kubernetes clusters

Garry Wilson, Curve

  • Curve’s case study article. It provides an overview of how to manage a Kubernetes cluster to handle live Curve card transactions while upgrading without downtime. They have switched from Kops to EKS and upgraded the version of EKS, aiming for full automation in the future.

Reminiscing control theory and the future of observability

Michael Hausenblas, AWS

  • At the beginning, he touches on the connection between his own control theory and observability (o11y), and explains the recent movement and future of observability.
  • Finally, after stating that he is wearing an AWS hat, he recommends checking out the AWS Distro for Open Telemetry , which is a downstream implementation of the Open Telemetry API and SDK.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

CNCF welcomes Katie Gamanji as Ecosystem Advocate

Cheryl Hung, CNCF

  • An article reporting that Katie Gamanji of American Express, a member of the CNCF’s TOC (Technical Oversight Committee), has been appointed to the CNCF’s Ecosystem Advocate. An interview video with Chryl Hung is embedded.
  • She will help develop and execute programs to expand the visibility and growth of the End User Community.
  • She will play an important role in exposing original insights from cloud native end users through formats like CNCF Tech Radar, and own initiatives end-to-end from conception, to execution, tracking engagement, and operationalizing for growth.

Antrea, with Antonin Bas

Adam Glick and Craig Box, Kubernetes Podcast from Google

What’s new in CKA/CKAD with CKS coming up!

Saiyam Pathak (Civo) and Walid Shaari

  • A YouTube video that describes the latest changes in CKA / CKAD and some use cases, as well as the new CKS certification.
  • I am grateful that the YouTube overview has a link that summarizes the points to study and tips for each exam.

Preparing Google Cloud deployments for Docker Hub pull request limits

Michael Winser and Dhaivat Pandit, Google Cloud

D2iQ takes the next step forward

Tobi Knaup, D2iQ

  • An announcement that D2iQ’s platform will be concentrated from Mesosphere to Kubernetes-based DKP (D2iQ Kubernetes Platform). The Mesosphere platform has begun a process towards termination.

Cloud native explained. An interview with Cheryl Hung, VP Ecosystem at CNCF

John Leonard, Computing

  • It was an article that required membership registration. I tried to register, but I gave up because I couldn’t register the pattern of the address other than the UK.

A sysadmin’s guide to containerizing applications

Scott McCarty, Red Hat

Argo CD and Tekton: Match made in Kubernetes heaven

Siamak Sadeghianfar and Burr Sutter, Red Hat

  • A web page with embedded Webinar that explains how to combine the power of Tekton Pipelines with ArgoCD to achieve a declarative approach to CI/CD based on GitOps principles.

4 ways to run Kubernetes locally

Mike Callzo,

A fireside chat to demystify KEPs

Amanda Katona, VMware

  • A CNCF article interviewing an overview of the Kubernetes Enhancement Proposal(KEP) and efforts to renew the Approval Plugin.

How Discord (somewhat accidentally) invented the future of the internet

David Pierce, Protocol

  • The path that Discord has walked through is very interesting.
  • It is different from what was originally intended, and it is amazing how much it has penetrated, and the twists and turns of the story itself are fun.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Kubernetes in the context of on-premises edge and network edge computing
Amr Mokhtar, Network Software Engineer @Intel Corporation
Nov 10, 2020 10:00 AM Pacific Time

Member Webinar: MicroK8s HA under the hood: Kubernetes with Dqlite
Konstantinos Tsakalozos, Senior Software Engineer @Canonica
Nov 11, 2020 7:00 AM Pacific Time

Member Webinar: The what and why of distributed tracing
Dave McAllister, Sr. Technical Evangelist @Splunk
Nov 13, 2020 10:00 AM Pacific Time

Member Webinar: Discover, analyze, and secure your APIs…anywhere
Pranav Dharwadkar, VP of Products
Jakub Pavlik, Director of Engineering
Dec 1, 2020 10:00 AM Pacific Time

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store