SRE / DevOps / Kubernetes Weekly Collection#9(Week 14)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #483 March 29th, 2020
SRE Weekly Issue #213 March 29th, 2020
KubeWeekly # 210: April 3rd, 2020

DEVOPS WEEKLY ISSUE #483 March 29th, 2020

A series of posts on common message-based middleware patterns using NATS. An introduction on the benefits of message architectures, setup instructions and more.

Part 2 / Part 3 / Part 4 / Part 5

  • The title is “NATS Messaging-Part 1-Part 5”.
  • Introducing the five-part series of the CNCF Incubating Project, Messaging Tool NATS , by Systems Architect RI Pienaar.
  • It has great commentary, diagrams, and demo videos, which is amazing. This is homework!

A detailed look at improving the performance of disk encryption in Linux.

  • The title is “Speeding up Linux disk encryption”.
  • Cloudflare’s Ignat Korchagin on his blog that Cloudflare has more than doubled the performance of disk encryption for Linux both internally and for customers.
  • The layers of encryption, source code, processes, protocols, performance tests and various other personally unfamiliar elements rushed in all at once. This is also homework.

Many CI systems have adopted Docker containers to run the compute. This post explains why that’s useful, and how and why to build customer images to support your toolchain.

  • The title is “A Quick Guide to Building a Custom Docker Image for CI”.
  • An article by John Ruble, Software Consultant & Developer at Atomic Object, that explains “Why Docker is such a great fit for CI and how it can be made even better with custom images.”.
  • There is also a simple hands-on to create a custom image and push it to Docker Hub.

A useful post on observability and tracing, with some good explanatory diagrams and discussion of opentracing.

  • The title is “Tracing and Observability”.
  • The everyday principle of “checking luggage location information, route, arrival schedule from luggage tracking information” also applies to software systems. Easy introduction text.

An explanation of how to catch recent API changes in your Kubernetes configuration using Open Policy Agent, Conftest and GitHub Actions.

  • The title is “How to detect outdated Kubernetes APIs”.
  • An article introducing Deprek8 and Conftest as tools to detect out-of-support Kubernetes APIs.
  • Deprek8 is a set of OPA (Open Policy Agent) policies that check for out-of-support API versions in your repository. OPA Deprek8 policy defined using Rego query language is explained as one of the usage methods of policy.
  • Conftest can be used with Deprek8 to apply Rego policies to any number of configuration files. It supports YAML, JSON, CUE, Dockerfile, HCL, HCL2 (Experimental), XML, etc.

A post in praise of AWS S3. The influence of S3 is hard to argue against and this post explains why.

  • An article by A Cloud Guru praising AWS’s S3. The company helped me when I paased the AWS SAA in English.

A starter kit for managing Helm charts using Helmfile. The repository introduces an opinionated workflow and provides examples of separating out per-environment configuration.

  • Helmfile Starter Kit GitHub page. A starter kit used to introduce complex software projects based on Helm files to Kubernetes.
  • Helmfile Starter Kit GitHub page. A starter kit used to introduce complex software projects based on Helm files to Kubernetes.

ssmsh is a handy shell for AWS EC2 Parameter store. The UI is modelled after a simple filesystem, so browsing parameters is immediately intuitive with ls, mv, rm and the like.

  • GitHub page for ssmsh, a handy shell tool for AWS EC2 Parameter Store.

A set of small tools for using AWS Systems Manager, including opening an interactive shell and running a command on multiple instances based on instance tags or names.

  • GitHub page of SSM Helpers, which is an auxiliary tool of “AWS Systems Manager that gives AWS infrastructure control and visibility”.

Major incidents lead to more alerts, more downtime and unhappy customers. See how modern DevOps-minded teams are building virtual war rooms to quickly mobilize cross-functional engineering and IT teams around major incidents — improving incident remediation while reducing burnout:

  • A blog post from Victor Ops, a sponsor of DevOp Weekly.
  • The title is “The War Room for Major Incident Response and Remediation”.
  • In the event of a failure, a large team will work together to identify the root cause, address the need to resolve it, and propose a “War Room” service and its 14-day free trial as a solution.

SRE Weekly Issue #213 March 29th, 2020

COVID-19: Why We Should All Wear Masks — There Is New Scientific Rationale

This is important, and well worth a read. Where’s the SRE connection? The article explains that the U.S. Surgeon General’s comment that masks are “not effective” led to a stigma against those that wear them here. That kind of unintended sociological effect is uncovered commonly in incident post-analysis.

Sui Huang

  • From the perspective of “SRE”, an article that talks with logic from the point of contention that “why we should all wear masks as a countermeasure against COVID-19”.

Keeping the Internet “Always On” — the Pressure of COVID-19 on Incident Response Teams

Pagerduty ran the numbers and discovered an increase in incidents recently, especially in certain companies.

Rachel Obstler — PagerDuty

  • According to PagerDuty’s research, the pressure of COVID-19 has led to an increase in incidents across all companies using the PagerDuty platform, with a significant increase in the number of companies offering specific services such as online learning.

February service disruptions post-incident analysis

Here’s the scoop on all those GitHub incidents in February.

Keith Ballinger — GitHub

  • Post-mortem analysis article of GitHub Inc.’s total of 4 incidents in February, 8 hours and 14 minutes of service interruption.
  • Originally, SQL data was stored in a single cluster, but as the service grew, it was split into new clusters by feature group set and new features in new clusters, but many core datasets remained in the original cluster.
  • They have consistently scaled their database to accommodate the increasing load on new users and products. In this case, unexpected database load balancing caused a cluster regression and inaccessibility.

Embrace Resilience for Business Continuity in Times of Uncertainty

No, it won’t be possible to continue operating business-as-usual. For the unforeseeable future, teams across the world will be dealing with cutbacks, infrastructure instability, and more. However, with SRE best practices, your team can embrace resilience and adapt through this difficult time.

Hannah Culver — Blameless

  • As SRE, he explains that the current difficult situation (problems of future outlook, reduction of team personnel around the world, instability of infrastructure, etc.) is treated as an incident and resilience is dealt with. article.
  • I received it as saying that it is difficult to be bound by the wording and rules created by the principle, and that it is based on the principle and that we improve according to reality because it is difficult.
  • He said that “learning resources will also need to be flexible”, introducing several resources and online events.

Remote incident management

5 tips for incident management when you’re suddenly remote

I love the concept of “ephemeral information”, that is, discussions that happen out-of-band, making it much harder to analyze the incident after the fact.

Blake Thorne — Atlassian

  • “I’ve suddenly become a remote worker, what should I do with incident management?” Atlassian’s Product Marketing Manager Blake Thorne asked five tips from many teams to share five tips.
  • Atlassian’s “ Incident Management Handbook “ , which has practiced “remote first incident management” , can be used as a reference because it is distributed free of charge if you fill in the necessary information (name/company name/title/email address).

Elastic Cloud January 18, 2019 Incident Report

Grey failure turned a seemingly reasonable auto-recovery mechanism into a DoS caused by a thundering herd.

Panagiotis Moustafellos, Uri Cohen, and Sylvain Wallez — Elastic

  • As the title suggests, a report issued on January 31, 2019 regarding a 1/18 incident last year (2019).
  • Customers using Elastic Cloud have experienced an estimated 3 hours of severe access failures in their AWS eu-west-1 (Ireland) region deployments, and almost 20 minutes of inaccessibility across all deployments during the same time period. It was
  • He apologized for the impact of the service, found out the root cause and took measures to prevent a recurrence, provided guidance to the contact point when there were further concerns or doubts, and made an apology again before looking back at the failure.

KubeWeekly #210: April 3rd, 2020

Editor’s pick of the highlights from the past week.

Join us for Cloud Native Summit Online!

With the postponement of KubeCon + CloudNativeCon EU, and many of our other favorite face-to-face industry events, CNCF, GitLab, Kong, and NetApp are excited to announce the Cloud Native Summit Online as another event to get the community together!

Cloud native open source projects, SIGs, and working groups are fundamental to many of our jobs. As we adjust to working remotely and maintaining productivity, we are excited to bring together experts from the community to provide insights and support around cloud native technologies and CNCF projects.

The virtual event will take place on Tuesday, April 7 from 6:00 am — 2:00 pm PT / 15:00–23:00 CET. We hope you’ll join us next week!

  • CNCF conducted Cloud Native Summit Online on 4/7 (Tue) local time. Japan time is 4/7 (Tue) 22:00 to 4/8 (Wed) 08:00. (According to Google Calendar)
  • The contents are as follows.
    * Graduated CNCF Project Updates-Kubernetes, Prometheus, Envoy, Jaeger, Fluentd, Containerd, CoreDNS, Vitess, TUF.
    * Communication with key SIG and WG contributors.
    * CNCF Update information of cloud native technology that is coming.
    * Something nice for remote work, such as gifts, jokes, tips, etc.

CNCF projects surpass one billion lines of code: A Q&A with DevStats creator Łukasz Gryglicki

CNCF Staff

In monitoring DevStats, the community came across an incredible milestone — all CNCF projects combined have surpassed one billion lines of code. That’s right, one billion!

To mark this achievement, we sat down with DevStats creator Łukasz Gryglicki to learn more about the tool, it’s history, and how our community can benefit from it. Read the blog here.

  • According to DevStats , an OSS tool that collects and visualizes CNCF project data , the CNCF project code exceeds 1 billion lines.
  • We interviewed DevStats creator Łukasz Gryglicki about the meaning of over a billion lines of code for DevStas and CNCF.

Weekly recap of CNCF member and project webinars that you might have missed.

You can view all CNCF recorded and upcoming webinars here.

CNCF Ambassador Webinar: Continuous Profiling Go Application Running in Kubernetes

Gianluca Arbezzano, Site Reliability Engineer @InfluxData

  • Webinar video explaining “Profiler OSS tool Profefe” by InfluxData SRE and CNCF Ambassador Gianluca Arbezzano. Previously, this blog also covered the article he wrote.

CNCF Member Webinar: MindSpore and Cloud Native Ecosystem

Zhipeng Huang, Open Source Community Manager @MindSpore and Yedong Liu, Open Source Engineer @Huawei

  • MindSpore’s Open Source Community Manager Zhipeng Huang and Huawei’s Open Source Engineer Yedong Liu explain “MindSpore, a new deep learning and guessing framework OSS tool and cloud native ecosystem” Webinar video.

CNCF Member Webinar: Container Security at Scale: Lessons Learned from the Front Lines with ABN AMRO and Palo Alto Networks

Wiebe de Roos, CI/CD Consultant @Flusso and ABN Amro and Keith Mokris, Technical Marketing Engineer @Palo Alto Networks

  • “Will Containers Secure DevOps Compliance in Large Scale Environments” by Wiebe de Roos, CI/CD Consultant at Flusso and ABN AMRO, and Keith Mokris, Technical Marketing Engineer at Palo Alto Networks. Webinar video explaining.
  • The explanations and illustrations are easy to understand, and the voice is easy to hear.

CNCF Member Webinar: Taming Your AI/ML Workloads with Kubeflow — The Journey to Version 1.0

Johnu George, Technical Lead @CPSG-AI at Cisco, David Aronchick, Head of Open Source Machine Learning Strategy @Microsoft and Elvira Dzhuraeva, Technical Product Manager AI/ML @ Cisco

  • AI/ML by Johnu George, Technical Lead of Cisco’s CPSG-AI Team, Elvira Dzhuraeva, Technical Product Manager AI/ML, and David Aronchick, Head of Open Source Machine Learning Strategy, Microsoft Webinar video that explains “How to master workload using Kubeflow”.

Tutorials, tools, and more that take you on a deep dive into the code.

How to detect outdated Kubernetes APIs

Tyler Auerback, Red Hat

  • I will skip it because it was taken up in DEVOPS WEEKLY ISSUE #483 above.

GitOps for Kubernetes

Agustin Romano, Caylent

  • An article introducing the GitOps overview, benefits, best practices and tools to implement GitOps, the CNCF Sandbox project Flux.

Evaluating Predictive Autoscaling in Kubernetes

Jamie Thompson, IBM

How to Secure Your Kubernetes Cluster on GKE

Lewis Marshall, Appvia

  • GKE is easy to use, but you have to control the security yourself, and you can read the documentation that many features and changes are adapted to the Kubernetes version.
  • The author recommends that “if you handle sensitive workloads in a production environment, set the implementation within the range described in this article”.

With Kubernetes Operators comes great responsibility

Jason Shepherd, Red Hat

  • An article that explains the usage method based on appropriate authority settings using Operator’s RBAC and Service Account.

Using UBI images to minimize container vulnerabilities

Rags Srinivas, Snyk

  • An article introducing Red Hat Universal Base Images (UBI) announced at the 2019 Red Hat Summit as an image that minimizes container vulnerabilities.

Build a Kubernetes Operator in 10 minutes with Operator SDK

Manuel Dewald, Red Hat

  • As described in the title, an article that explains the CLI that makes an operator of Kubernetes in 10 minutes with Operator SDK and explanation. It is easy to start and the hurdles go down. Before the above article that considers Operator’s security, it seems good to do from here.

Kpt: Packaging up your Kubernetes configuration with git and YAML since 2014

Phillip Wittrock, Google

  • Introduction article of OSS tool Kpt which is YAML management tool of Kubernetes from Google’s Open Source Blog.

Provisioning cloud resources (AWS, GCP, Azure) in Kubernetes

Daniele Polencic, LearnK8s

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Edge Computing Requires Cloud Native Thinking Today

Bill Mulligan, Loodse

  • “To operate Kubernetes and cloud-native technology with edge computing, there must be a business model for operation, but it is still in its infancy, and it is expected that the market will grow by about 30% per year by 2024. So let’s do it together,” and an article calling for participation in the community and discussions.

Optimising UE4 Project Builds With Cloud Native Infrastructure And Containers

Jose Moreira

  • While struggling to change jobs during COVID-19 , contributing to the Unreal Containers community, freeing up resources such as free developers and small to medium sized game companies to make great games. An article by the author who has the ambition to become.

Migrating to Kubernetes

Todd Campbell, Sensu

  • An article focusing on the decisions that readers will have to make, while briefly touching on the differences from other platforms when migrating Kubernetes.

14 Kubernetes interview questions: For hiring managers and job seekers

Kevin Casey, Red Hat

  • The articles share the idea of “Filling Kubernetes jobs can be tricky because the technology is relatively young, so 14 questions that can be used/prepared in interviews for hiring managers and job seekers”.
  • The word “good questions lead to good answers” recently came to my mind, but I thought this question list was a good question for understanding points.

Service Mesh Adds Security, Observability and Traffic Control to Kubernetes

Emily Omier, The New Stack

  • An introductory article in a series of two weeks by The New Stack with the theme “The Value of Service Mesh for Kubernetes Deployment”.
  • Kubernetes and service mesh, Kubernetes and Istio relationship, service mesh theme security, observability, and routing are explained.


BotKube can be integrated with multiple messaging platforms like — Slack, Mattermost to help you monitor your Kubernetes cluster(s), debug critical deployments and give recommendations for standard practices by running checks on the Kubernetes resources.

  • Bot page of “BotKube”, a tool to monitor, debug, and check Kubernetes.
  • It seems good as a UI that you can monitor with bot and chat, and debug with CLI.

MKIT — Managed Kubernetes Inspection Tool

Brad Geesaman, Darkbit

  • Kubernetes io page of “MKIT (Managed Kubernetes Inspection Tool)” which is a tool for quick and easy check of misconfiguration. Click here for the GitHub page.

HashiCorp Joins the CNCF

Adam Fitzgerald, HashiCorp

Ansible for Kubernetes by Jeff Geerling Free until end of April

Jeff Geerling, Ansible

  • Jeff Greeling, the author of “ Ansible for DevOps “ and “ Ansible for Kubernetes “ said, “To help people who are self-isolated or lost their jobs gain automation skills” until the end of March. The article told you that the distribution of the book had been extended by the sponsor of Device42 until the end of April(at that moment)
  • At the same time, he thanked for personal donations and those who gave their words to him.

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
April 3, 2020 10:00 AM Pacific Time

Pravega: Rethinking storage for streams
Member webinar
April 7, 2020 10:00 AM Pacific Time

Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Member webinar
April 8, 2020 10:00 AM Pacific Time

New thoughts on distributed file system in the cloud native era
Member webinar
April 9, 2020 10:00 AM Pacific Time

Declarative Host Upgrades From Within Kubernetes
Adrian Goins,Director of Community and Evangelism @Rancher Labs
Dax McDonald,Software Engineer @Rancher Labs
Jacob Blain Christen, Principal Software Engineer @Rancher Labs
Member webinar
April 14, 2020 10:00 AM Pacific Time

Enabling Cloud Native Storage for the Enterprise
Chris Merz, Principal Technologist for DevOps @NetApp
George Tehrani, Product Manager for Kubernetes and Cloud Native Data @NetApp
Member webinar
April 16, 2020 10:00 AM Pacific Time

KubeCarrier: The Operator of Operators
Nico Schieder, Software Engineer @Loodse
Member webinar
April 22, 2020 10:00 AM Pacific Time

杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis
张文墨Larry Zhang, 解决方案架构师 Solution Architect @Mirantis
Member webinar
This webinar will be delivered in Chinese
April 23, 2020 10:00 AM China Standard Time

Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time

Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store