SRE / DevOps / Kubernetes Weekly Collection#12(Week 17)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #486 April 19th, 2020
SRE Weekly Issue #215 April 20th, 2020
KubeWeekly #213 April 25th, 2020

DEVOPS WEEKLY ISSUE #486 April 19th, 2020

Post-incident reviews are increasingly common, but how many of them focus on learning from incidents vs blame or simplistic understanding? This deck contains lots of tips for how to improve the practice.

  • The title is “Incident Analysis: How Learning is Different Than Fixing”. Article dated January 31st.
  • He said as The Main Gist below
    * Current and typical approaches to “learning from incidents” have very little to do with actual learning.
    * Learning is not the same as fixing.
    * Most post-incident review documents are written to be filed, mpt written to be read.
    * Changing the primary focus from fixing to learning will result in a significant competitive advantage
  • As previously pointed out in other articles, focusing on reducing the number of incidents will help those who reduce the number of incidents (through reporting methods, frequency, etc.) and adversely affect the organization.

A detailed tutorial on how to build a GitHub Action. Covers everything from managing dependencies to testing and releasing the Action.

  • The title is “Releasing GitHub Actions”. Article dated January 31st.
  • If you have never used GitHub Actions, I recommend checking the previous article (September 19, 2019), “ Working with GitHub Actions “. This article can be done in hands-on format while reading the explanation, so I will check it by getting my hands dirty.

A podcast recording (and handily notes) on microservices testing approaches, covering everything from bridging technology generations to the pros and cons of local tools vs remote/staging environments.

  • Envoy creator and software engineer at Lyft, Matt Klein transcribes a podcast about microservices testing strategies. The podcast is embedded in the lower part of that page, so check it if you like to listen to the main part.

An in-depth post on the need for coordination between operations and information security folks, looking at the benefits of organising games to improve coordination.

  • The title is “Shall We Play a Coordination Game?”
  • As the title of the article the author wrote before, “Security as a Product”, a story that examines the relationship between both teams through security as a product and cooperation, through a lens of the concept of moral hazard, a cooperative game of game theory of behavioral economics.
  • When I saw the notation of “26 minutes”, which is the estimated time to read, I was very disappointed.

A series of blog posts looking at the foundations of Open Policy Agent, and exploring the Rego policy language.

An opinionated take on API transports, comparing gRPC, OpenAPI and REST style APIs and when to choose which option.

  • The title is “Shall “API design: Understanding gRPC, OpenAPI and REST and when to use them”.
  • It is considering conforming HTTP and RPC-style APIs through gRPC, OpenAPI, REST. Since my premise and inexperience are insignificant, I will read it again.

A look at building a LinuxKit virtual machine to run on Azure, specifically for running container workloads.

  • The title is “Creating a minimal OS for containers with LinuxKit and Azure.”
  • A story of creating a minimal OS using LinuxKit and Azure environment. He says that “Historically I thought only massive companies such as Canonical or RedHat were capable of building out a Linux distribution, but the LinuxKit tool drastically lowers that barrier to entry”.

A post on hosting your own Helm Chart repository on Google Cloud, using Terraform and CircleCI in the mix.

  • The title is “Creating a Helm repo on Google Cloud.”
  • He had the following issues, so he moved his hand to solve them.
  • As a platform engineer he wanted new chart versions to be available as quickly as possible across all envs.
  • So that HelmReleases didn’t fail on startup because the version does not exist.
  • And dependencies on Helm Releases were kept outside of the cluster.

If you find yourself writing or editing lots of Kubernetes YAML files in Vim this post is for you. It shows how to configure a language server to provide autocompletion and inline hints for the various properties.

  • The title is “Vim Kubernetes YAML Support”.
  • When writing a Kubernetes YAML file with vim, you can set it so that completion and explanation of resources are included. A YouTube video with this content is also linked, so please have a look. The default schema is Kubernetes 1.14.0 schemas, but there are instructions on how to update it.

Pomerium is an identity-aware proxy that enables secure access to internal applications, providing an interesting alternative to custom authentication systems and VPNs.

  • GitHub page of “Pomerium”, an IAP (identity-aware proxy) OSS tool that enables secure access to internal applications. Click here for the io page.

SRE Weekly Issue #215 April 20th, 2020

Embracing the beautiful mess

The “messy” details of our human/computer systems is their hidden strength.

Lorin Hochstein

  • Often, it is ideal that the human/computer system is in order, but in reality it is not (e.g., people who are not determined to be on-call duty, help of people who are accidentally called to the Slack channel, etc. ), and argues that mess is a necessary element of the system.

Accident Case Study: Just a Short Flight

In this accident report, learn how two pilots lost situational awareness, with disastrous consequences.

Air Safety Institute

  • A case study on the subject of an American plane crash.
  • There are many things to do, such as pilot skills, captain supervision, and flight plan meetings. However, since all of them cause obstacles due to nearby factors (lack of situational awareness/lack of skills/lack of recognition alignment/appropriateness), we will take a lesson.

Succeeding With Service Level Objectives

Without a structured strategy, and careful consideration of the full SLO lifecycle, SLOs risk partial implementation. This can result in low ROI and, > in many cases, a complete failure.

Danny Mican — Squadcast

  • Auth0’s Senior SRE Danny Mican ‘s article on how to make an SLO from scratch using the IIDARR process. It’s interesting, so I want to read it over carefully.
  • IIDARR is taken from the initials of the following elements.
    * Identify
    * Instrument(Measures)
    * Define
    * Alert (Action)
    * Report/Refine

Back to Basics: Why Global Infrastructure Matters

The cloud’s multiple availability zones and regions can be powerful, but it’s hard to get a multi-region architecture correct.

Serhat Can — OpsGenie

  • This article focuses on the “Global Infrastructure”, which is the aspect that promises the reliability of the cloud that I often overlook.

SLA Uptime calculator

A useful little JavaScript tool: plug in an availability percentage (e.g. 99.99%), and get back the number of minutes you can be down in a day, month, quarter, or year.


  • “SLA Uptime calculator!” SLA can be instantly calculated on a day/week/month/quarter/year basis, which is convenient!

Hosted Pools Availability Degradation

Azure Pipelines had an incident of delayed builds at the end of March. Find out more in this post-incident analysis.

Chad Kimes — Microsoft

  • Azure post-mortem regarding build and release delays for Windows and Linux agents that occurred in EU and UK due to a pandemic (COVID-19) from 3/24 to 3/26.
  • I felt my stomach ache when I imagined that it took me about 5 hours to notice something in 10 minutes on the first day due to a communication problem (which seems to be a monitor design problem).

Free Google Book: Building Secure and Reliable Systems

Google published another book in their SRE series. This short summary gives an overview of what’s inside along with an explanation of the motivation for another book. See also: Google’s announcement

Todd Hoff — High Scalability

  • The third SRE book “Building Secure And Reliable Systems” by Google has been released online for free! Chapter 21, page 557. I also want to hold a reading session.

One Team at Uber is Moving from Microservices to Macroservices

The pendulum is swinging back, and folks are starting to see the downsides of a plethora of microservices, including early champions, Uber.

Todd Hoff — High Scalability

  • Uber Inc. from micro service macro begins talking about the fact that has announced the transition to the service (well-sized services).
  • “ Building reliable and testable microservices is a lot harder than most folks think” “It’s a macro service, not a monolith”.”May or may not have/need monorepo”. “Better observability and debugging”. These words that jumped to various suggestions are lined up. It is better to read the context properly to avoid misunderstandings and expanded interpretations.
  • Click here for the corresponding tweet.

KubeWeekly #213 April 25th, 2020

Editor’s pick of the highlights from the past week.

Kubernetes Podcast episode 100: Community Redux, with Paris Pittman

To celebrate the 100th episode of the Kubernetes Podcast from Google, hosts Adam Glick and Craig Box welcome back their first ever guest, Paris Pittman. Paris is an open source program manager at Google Cloud, member of the Kubernetes steering committee, and founder of the CNCF Contributor Strategy SIG and the Kubernetes contributor communication committee. Paris looks at how the Kubernetes community has changed and ways in which it has stayed the same, as well as how other projects can adopt learnings from Kubernetes.

Bundle Training and Certification to Jump Start Your Career

Dan Brown, The Linux Foundation

The Linux Foundation offers training and certification bundles, which provide the courses needed to gain the knowledge necessary to succeed in a chosen open source career, and a certification exam to enable you to confidently demonstrate that knowledge to potential employers. Bundles are the more direct way to get qualified for a new open source career, or add new skills to advance your current one.

And there is a sale! 30% off offer for all bundles, courses and certification exams. Use code ANYWHERE30 at checkout. LF also has nearly two dozen completely free training courses that are always available to help you get started and determine the open source technology area in which to focus.

  • The campaign up to 4/6 provided by CNCF has been extended to 4/30(at that moment).

Weekly recap of CNCF member and project webinars that you might have missed.

You can view all CNCF recorded and upcoming webinars here

CNCF Project Webinar: Announcing Open Source gRPC Kotlin

James Ward, Developer Advocate @Google Cloud Platform

  • Webinar video of the day after the announcement by James Ward of Google’s Developer Advocate, whose name has risen as a contributor in the article “ gRPC, meet Kotlin “ that was covered by the former blow. Introduction of OSS-based gRPC Kotlin project and how to use it.
  • Other contributors also participate and follow each other, and the atmosphere seems to work well as a team.

CNCF Member Webinar: Ensuring compliance, without sacrificing development agility and operational independence, in K8s with OPA Gatekeeper

Sertaç Özercan, Software Engineer @Microsoft and Lachie Evenson, Principal Program Manager @Microsoft

  • Webinar video by Microsoft Software Engineer Sertaç Özercan and Microsoft Principal Program Manager Lachie Evenson.
  • “Gatekeeper”, a subproject of OPA (Open Policy Agent), is introduced as “a method of ensuring compliance without compromising development agility and operational independence.”
    It’s like applying a policy that runs in OPA with an Admission Webhook that you can customize for your Kubernetes cluster.

CNCF Member Webinar: Kubernetes RBAC 101

Oleg Chunikhin, CTO @Kublr

  • Kubinr CTO Oleg Chunikhin’s Webinar video explaining the concept and objects of RBAC (Role-Based Access Control) in Kubernetes.

CNCF Member Webinar: 如何让你的Windows应用运行在Kubernetes平台

杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis 张文墨 and Larry Zhang, 解决方案架构师 Solution Architect @Mirantis

  • Webinar delivered in Chinese by Alex Yang, Mirantis Solution Architect, and Larry Zhang, the same position at the company. I wonder if there will be a day when Japanese will line up here. I think that if languages ​​other than English are lined up side by side, diversity/base will expand. Even if I improve my English, my native language is easier to remember and express (in my capability of English and other skills).

Tutorials, tools, and more that take you on a deep dive into the code.


kubesort is a tool that helps you sort the results from kubectl get in an easy way

  • The GitHub page of the OSS tool “kubesort” that easily sorts the results of kubectl get.
  • For example, you can just type kubectl --sort-by=.status.containerStatuses[0].restartcount get po instead of kubesort statusand to sort the pod status. You don't have to sort the hierarchical representation using json-path.
  • v0.1.0 supportskubectl get podonly. In the roadmap, v0.2.0 will support Deployments, services, namespaces, and auto-completion will be possible.

Videos: Intro to Vitess — its powerful capabilities and how to get started

Abhi Vaidyanatha, PlanetScale

Learn from our maintainer: build/run/test your Velero code locally and in cluster

Carlisia Campos, VMware

  • A YouTube video by VMware’s maintainers explains the way to handle Velero code locally and in clusters, as titled. Easy to see terminals and procedures.

Multicluster-Scheduler and Argo (Workflows and CD): a Deep Dive

Gokul Chandra

  • This article describes Deep Dive, which combines Multicluster-Scheduler , Argo Workflows, Argo CD, Virtual-Kubelet, Cilium, etc., in order to consider configurations that support multi-cluster and multi-cluster environments.
  • There are plenty of explanations and screenshots, and the contents are varied, but I would like to try it. Homework.

Virtual 4G Simulation Using Kubernetes And GNS3

Christopher Adigun, Loodse

  • Article deploying a virtual 4G stack using Kubernetes and GNS3.
  • I’m very grateful to GNS3 in the environment where I used to work, and I’m curious about networks, so I’d like to do this later.

Building a Kubernetes-Based Platform: Progressive Delivery, the Edge, and Observability


  • The author explains that “Kubernetes has been widely adopted and it provides a solid foundation on which to support the other three capabilities of a cloud native platform that enables full cycle development.
    * Continuous Delivery Pipelines
    * The Edge Stack
    * The Observability Stack

How GKE surge upgrades improve operational efficiency

Tamas Ragoncsa and Kobi Magnezi, Google

  • An article that explains how “GKE surge upgrade improved operational efficiency” by Google. The surge upgrade will be enabled by default from 4/20 and existing node pools will also move during the quarter(at that moment).

Rolling Updates and Blue-Green Deployments with Kubernetes and HAProxy

Nick Ramirez, HAPProxy

  • Article on rolling update and Blue-Green deployment using “HA Proxy Kubernetes Ingress Controller” by HA Proxy dated 2/11. The execution environment is Minikube.

EKS Service Accounts Explained

Jason Smith

  • It describes what he was confused about implementing AWS’s ability to add IAM permissions to pods a few months ago, he helpfully cleared up some of the confusion on what AWS was actually doing, and what he believes they did right and what they did wrong.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Navigating the Kubernetes Hype Cycle

Cornelia Davis and Liz Rice

  • Introducing the new podcast “The Art Of Modern Ops” hosted by Cornelia Davis, CTO of Weaveworks and author of “Cloud Native Patterns”.
  • Liz Rice, the chairman of Aqua Security VP Open Source Engineering and CNCC’s TOC (Technical Oversight Committee), welcomed as a guest and the theme is “Navigating the Kubernetes hype cycle”. The number of podcasts I want to listen to has increased again. Immediately subscribe.

Kubernetes architecture for beginners

Kevin Casey, Red Hat

  • An article that explains “Kubernetes, the basics of Kubernetes architecture and key factors”.

Pancake Podcast: Cassandra and the Kubernetes Data Plane

Joab Jackson, The New Stack

  • Podcast has a panel discussion on “What is the role that the data plane plays in a Kubernetes ecosystem?”

GigaOm Radar for Hosted Kubernetes Solutions

Enrico Signoretti, Gigaom

  • Report of “Hosted Kubernetes Solutions” by GigaOm.
  • Only the open page radar and Summary can be viewed for free.

Is Kubernetes becoming the driving force of enterprise IT?

Graham Berry, RedHat

  • The author explained Kubernetes along with the theme “Is Kubernetes becoming the driving force of enterprise IT?”, and concluded that “Ultimately, what do you want your teams to focus on? If the answer is building world-class services for customers and getting them to market faster than ever before, then Kubernetes would be a potent weapon in your armour”.

NetApp to make stateful applications easier to do in Kubernetes

Steven J Vaughan-Nichols, ZDNet

  • An introduction to NetApp’s efforts through Project Astra to make it easier to deploy stateful apps on Kubernetes storage and container platforms.

The important things I know which helped me pass the CKAD exam

Vishwas Javalgekar

  • An article titled “Important things that helped me to obtain CKAD (Certified Kubernetes Application Developer)” introduces tips for using aliases and deleting resources without Grace period.

Istio Service Mesh in 2020: Envoy In, Control Plane Simplified

Alon Berger, Alcide

  • An article that describes Istio’s current trends and updates in 2020.

Master Shifu & His Cloud-Native Mentoring Sessions

Vishal Biyani, Infracloud

  • Article introducing “History of InfraCloud” and mentorship program “Talk-To-Us” for students/engineers who are interested in cloud-native technology through “Kung Fu Panda” as a model through characters such as Shifu Roshi.

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member webinar: Kuma: Service Mesh and the Future of Application Connectivity
Marco Palladino, Kong
April 28, 2020 10:00 AM Pacific Time

Member webinar: KubeCarrier: the Operator of Operators
徐嘉诚 Jiacheng Xu, 软件开发工程师 Software Engineer @Loodse
This webinar will be delivered in Chinese.
April 29, 2020 10:00 AM China Standard Time

Member Webinar: Building Zero Trust based Authentication in Healthcare with SPIRE
Bobby Samuels, Vice President, AI Technology @Anthem, Inc.
Frederick Kautz, Head of Edge Infrastructure
Emiliano Berenbaum, Chief Technologist, HPE Labs @Hewlett Packard Enterprise
April 29, 2020 10:00 AM Pacific Time

Member webinar: Best Practices In Implementing Container Image Promotion Pipelines
Baruch Sadogursky, Head of DevOps Advocacy @JFrog
April 30, 2020 10:00 AM Pacific Time

Community webinar: How to Conduct a GREAT Live Stream
Alex Lustenberg, Jorge Castro, Chris Short
April 30, 2020 1:00–3:00pm Pacific Time

Project webinar: Kubernetes 1.18
Kubernetes release team
May 1, 2020 9:00 AM Pacific Time

Member Webinar: How AWS uses Firecracker and Fargate to run serverless Kubernetes pods in Amazon EKS
Mo Ziyuan 莫梓元 解决方案架构师 @AWS
This webinar will be delivered in Chinese.
May 7, 2020 10:00 AM China Standard Time

Member webinar: Data Services for Cloud Native Workloads
May 12, 2020 10:00 AM Pacific Time

Member Webinar: Piraeus: Dynamic Provisioning, Resource Management and High Availability for Local Persistent
Philipp Reisne, CEO @Linbit
Sun Liang, 资深存储架构师 @DaoCloud
Alex Zheng, 资深存储工程师 @DaoCloud
This webinar will be delivered in Chinese.
May 13, 2020 10:00 AM China Standard Time

Member webinar: Cloud Native Monitoring: Scaling Prometheus
Aaron Newcomb, Director, Product Marketing, Monitoring @Sysdig
Carlos Arilla Navarro, Technical Marketing Engineer @Sysdig
May 19, 2020 10:00 AM Pacific Time

Member webinar: Kubernetes Cost Allocation Done Right
Webb Brown, Co-founder and CEO @Kubecost
June 24, 2020 10:00 AM Pacific Time

Member Webinar: Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
June 30, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store