SRE / DevOps / Kubernetes Weekly Collection#23(Week 28)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #497 July 5th, 2020
SRE Weekly Issue #226 July 5th, 2020
KubeWeekly #224 July 10th, 2020

DEVOPS WEEKLY ISSUE #497 July 5th, 2020

Dashboards aren’t just used in computer operations, and we can always learn from other disciplines. This excellent essay on the history of urban dashboards is well worth reading.

  • The title is “Mission Control: A History of the Urban Dashboard”.
  • An article that describes the history of Urban Dashboard as an example of using the dashboard outside of computer operation.
  • Since the first article was heavy and I was interested, it took a long time here. I will read it again.

A project to describe a set of reference architectures for AWS Serverless applications. Lots of hard won knowledge here. Starting with a simple web service, scalable webhook and strangler pattern.

  • The title is “Serverless Reference Architectures”.
  • An article by the author that continuously introduces a reference architecture for serverless. “Serverless Microservice Patterns for AWS” written about two years ago is a popular reference material for newbies of serverless and veterans too.

An interesting paper on Residuality Theory, and the design of complex software systems. In particular looking at non-functional properties as first class citizens of design efforts.

  • The title is “An Introduction to Residuality Theory: Software Design Heuristics for Complex Systems”.
  • Page of materials presented at “The 7th International Workshop on Computational Antifragility and Antifragile Engineering”. You can download the full text and materials in PDF format from the link.
  • It introduces the “Residuality Theory” as a way to explore business, software, and infrastructure architectures across many different platforms and paradigms, and to describe architecture in the same way regardless of perspective.

A nice example of a team using Open Policy Agent and Conftest to enforce best practices and various policies when authoring Kubernetes configuration.

  • The title is “Accelerated Feedback Loops when Developing for Kubernetes with Conftest”.
  • Initially raised the issue as “The feedback loop when deploying to Kubernetes can be quite slow.” It explained Conftest and other tools to solve following concerns.

・The Open Policy Agent always expects JSON in order to evaluate policies. Kubernetes on the other hand, speaks YAML.
- Conftest handles converting multiple file formats such as.hcl, Dockerfile, and even yaml into JSON so that it can be interpreted by OPA.
・Verify API compatibility with Deprek8ion
- Deprek8ion is a set of Rego policies that can be used to see if any of our resources are currently, or will be, deprecated in a given Kubernetes release.
・Find security concerns with Kubesec
- Kubesec is a set of Rego policies that can be used to see if any of our resources have any insecure configurations.
・Notably continuous Kubernetes cluster auditing with Gatekeeper, and infrastructure security compliance with Regula.

Notes from the recent HashiConf Digital. Lots of news and case studies around Terraform, Consul, Vault and more.

  • The title is “HashiConf Digital Wrapup, June 2020”.
  • Summary article of “Hashi Conf Digital”. Rich Burroughs , Senior Developer Advocate, has done it on the Web page of Fire Hydrant, the sponsor of the event.
  • “One of the biggest surprises for me was that HashiCorp not only moved the event online but built a platform to host it. When I heard they had done that, I have to admit that I scratched my head a little.But as I thought about it more, it made more sense. First, the company closed a big round of funding in March, so it’s not hurting for resources. And I’m sure the folks making these decisions understood that events moving online isn’t going to end when there’s a COVID-19 vaccine. Virtual conferences are likely to be much more of a trend going forward, so building the tool you want to run them is a better investment than it might sound like.” Rich has expressed his point of view.

A nice introduction to chaos engineering, putting it in context with other types of testing.

  • The title is “Chaos Engineering”.
  • An article that describes chaos engineering on the chaosmesh Web page.
  • The two important points the author described are:
    1. “First, chaos engineering helps me to detect and eliminate technical debts and the so-called dark debts in my system. “
    2. “Secondly, Chaos Engineering helps us to better understand the systems we build and operate and to regain confidence and trust.”

A presentation on the state of serverless, with lots of thoughts about where serverless is heading and what the most interesting problems are in 2020.

  • The title is “Serverless: Past, Present, and Future “.
  • A slide explaining the history of serverless uploaded to SlidaShare with “Tim’s 2020 AWS wishlist” and his thoughts.

A look at the Azure Service Operator, a set of custom resource definitions for Kubernetes for managing Azure services like Storage Account, Virtual Machines and Azure SQL.

  • The title is “Azure Service Operator — manage your Azure resources with Kubernetes”.
  • An article that explains why you need to use Azure Service Operator, where it can help you, and two examples, and how it can help you manage your Azure resources with Kubernetes.

Vector is a lightweight, ultra-fast tool for building observability pipelines. Take data from files, syslog, statsd and more, then transform and output to S3, elasticsearch, prometheus, etc.

  • The GitHub page of the OSS tool “Vector” that builds a lightweight, fast observability pipeline.

Ortelius is a tool for mapping microservices. Version and track application configuration and help with adopting domain driven design.

  • The GitHub page of the OSS tool “Ortelius” that maps the composition of microservices according to the relationship with the application that uses the service. Click here for the io page.
  • “Your microservice configurations are versioned and tracked so they are never lost”, he said.

BeeMesh is intended to run services on a cluster of machines, using a peer-to-peer model and podman under the hood. Potentially interesting in edge scenarios.

  • OSS tool “Bee Mesh” io page that combines zero-trust data-centric security with a peer-to-peer concept. Click here for the GitHub page.
  • “Distributed processing, service-based meshing, and ad hoc storage are the current and predictable requirements for evolution.”

SRE Weekly Issue #226 July 5th, 2020

A Doctor Confronts Medical Errors — And Flaws In The System That Create Mistakes

This is an article version of an interview with Dr. Danielle Ofri, author of a new book When We Do Harm, on NPR’s Fresh Air. I especially loved the part about near misses.

Bridget Bentz, Molly Seavy-Nesper, Deborah Franklin, Sam Briger, and Thea Chaloner — NPR

  • Interview article with the release of a new publication by a physician who has worked at a hospital in NY for over 20 years. The story of a doctor confronting medical mistakes. The history of the electronic medical record system was also interesting.

Heroku incident 2081 follow-up

Maintenance of the logging system had unintended downstream effects including log loss and failure of the system that manages dynos.

  • Follow-up information for the defect that occurred at Heroku from June 29, 2020 19:09 UTC to 20:20 UTC.
  • User impact such as missing application log, build failure, error of dyno related command via API/CLI/dashboard occurred.

Heroku incident 2045 follow-up

In this incident, a TLS certificate was deployed without its intermediate, resulting in failures for some clients.

  • Similar to the above, follow-up information for the failure that occurred on Heroku from June 12, 2020 18:33 UTC to 19:13 UTC.
  • The effect that TLS connection to applications for the domain * was disabled.

Software engineering responses to COVID-19 — My take on REA’s Webinar

I wrote this after attending the Resilience Engineering Association’s webinar with panelists Dr. Richard Cook, John Allspaw, and Nora Jones, moderated by Laura Maguire. Once the recording is posted, I highly recommend watching!

Lex Neva

  • An article sharing the notes taken by the author from the webinar “Software engineering responses to COVID-19” held by REA (Resilience Engineering Association). It seems that the video has not been published yet, but the editor says “Once the recording is posted, I highly recommend watching!”

How SLIs Help You Understand Users’ Needs

As SREs, we need to be laser focused on the user’s experience. Our SLIs should reflect that.

Emily Arnott — Blameless

  • An article on Blameless Web page that explains “How SLIs Help You Understand Users’ Needs”.

Twitter’s Reliability Journey

This two-part series is an in-depth look at how Twitter adopted SRE, before SRE was even a thing.


  • An article on Blameless Web page too. Interviews with Brian Brophy (Sr. Staff SRE), Carrie Fernandez (Head of Site Reliability Engineering), JP Doherty (Engineering Manager), and Zac Kiehl (Sr. Staff SRE) from Twitter Inc. An article summarizing the contents of an interview about how it is practiced.

Part 1 of a two-part article. Click here for Part 2.

KubeWeekly #224 July 10th

Editor’s pick of the highlights from the past week.

CNCF Scales Sandbox Approval Process to Meet Growing Demand from New Projects

Introduced this week, the new Sandbox approval process will increase the acceptance of new projects into the CNCF, as well as reduce barriers for open source projects seeking neutral grounds to accelerate their innovation, adoption velocity, and community building efforts.

“The CNCF Sandbox has long played an important role enabling neutral collaboration and experimental cloud native projects to thrive, but with record demand by projects to join the CNCF community, we agreed that the process could be refined in new ways to speed the review and approval process,” said Chris Aniszczyk, CTO of the Cloud Native Computing Foundation. “I’m thrilled that the CNCF TOC has put in place a great new process that simplifies the barrier to entry for worthy projects and increases innovation, which recently led to 11 new Sandbox projects being accepted”.

Learn more about the new approval process here.

  • The CNCF page that guides efforts to lower barriers to entry for new projects by simplifying and speeding up the approval process (review, approval) for CNCF Sandbox.

CNCF Project News: TOC approves Operator Framework and Contour as Incubating Projects

Exciting project news — Operator Framework, which is made up of two main components Operator SDK and Operator Lifecycle Manager (OLM) is now an incubation-level hosted project.

TOC also approved Contour, a high-performance ingress controller for Kubernetes that provides a control plane for Envoy, is now an incubation-level hosted project.

Congratulations to both projects and their respective teams!

  • Information from CNCF to the CNCF Incubating project (Maturity one earlier stage of the Sandbox above) that Operator Framework and Contour have been approved.

KubeCon + CloudNativeCon EU Virtual Session Spotlight

The countdown to KubeCon + CloudNativeCon EU Virtual on August 17–20, 2020 is on! As we approach the event, we curated a few recommended sessions that we don’t want you to miss. Please see the feature for this week and be sure to register today!

Tutorial: Communication Is Key — Understanding Kubernetes Networking

Presented by Jeff Poole, Vivint Smart Home

Networking in Kubernetes has several aspects, including DNS, iptables, routing, software bridges, IP assignment, network policies, etc. While the practices for understanding the network were fairly easy to translate from physical servers to virtual machines, the level of complexity increases greatly when moving to containers in Kubernetes.

This tutorial will explain several of the networking concepts used in Kubernetes with accompanying lab exercises in a virtualized environment so that participants will become comfortable looking under the hood at how a Kubernetes cluster is working (or not working, as the case may be).

The material will be designed for people comfortable with SSH, bash, kubectl, and basic networking concepts, and will fill in the more advanced networking knowledge as the tutorial progresses. Please have Vagrant + VirtualBox installed to run the labs locally.

Register now!

  • KubeCon + CloudNativeCon EU Virtual highlights the “The Beginners Guide to the CNCF TOC” session. Schedule: 8/17 (Monday) 16:55–18:15 CEST (Central European Summer Time).
  • Jeff Poole’s tutorial session aimed at helping participants understand the Kubernetes networking concept. Through exercises in a virtual environment, help you understand how your Kubernetes cluster is running internally.

Weekly recap of CNCF member and project webinars that you might have missed.

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: Optimize your Kubernetes Clusters on Azure with Built-in Best Practices

Jorge Palma, Senior Program Manager @Microsoft

  • It explains how an Azure solution, combined with a CNSF project, provides actionable best practices to automatically optimize, streamline, and identify potential issues before a Kubernetes cluster experiences issues.

CNCF Member Webinar: Building Production-ready Services with Kubernetes and Serverless Architectures

Mike Metral, Software Architect and Engineer @Pulumi and Jason (Jay) Smith, App Modernization Specialist @Google Cloud

  • Webinar explaining the following through Pulumi and Knative.
  1. How to deploy/manage Kubernetes clusters and workloads using a real programming language.
  2. How to build a production-ready stack and advance the lifecycle of your cluster and apps.
  3. How to build and deploy serverless eventing buses and pipelines.
  4. How to run a real-time streaming application

CNCF Project Webinar: What’s new in Linkerd 2.8: Multi-cluster Kubernetes made simple and secure by default

Oliver Gould, Linkerd Project Lead, co-founder and CTO @Buoyant

  • Linkerd creator Creator Oliver Gould describes the new features of Linkerd 2.8 and how to get started using them right now.

CNCF Member Webinar: The Challenges and Countermeasures of Service Mesh Practice

裴斐 (Fei Pei), 网易 杭州研究院 轻舟云原生技术专家、架构师

  • The language for this webinar is Chinese.
  • He describes the problems, construction ideas, and solutions from NetEase’s service mesh practices in popular areas such as e-commerce, AI, and news, and provides a reference for enterprise service meshes.

CNCF Member Webinar: The top 7 most useful Kubernetes APIs for comprehensive cloud native observability

Caleb Hailey, Co-founder and CEO @Sensu

  • Sensu Co-founder and CEO Caleb Hailey discusses the various Kubernetes APIs needed for full visibility into the Kubernetes platform using examples of OSS monitoring tools such as Prometheus and Sensu.

CNCF Member Webinar: How to land Service Mesh — From technology selection to practice

马若飞, FreeWheel 北京研发中心首席工程师 @FreeWheel

  • The language for this webinar is Chinese.
  • He explains from technology selection of service mesh to best practices.

Tutorials, tools, and more that take you on a deep dive into the code.

Install a Kubernetes load balancer on your Raspberry Pi homelab with MetalLB

Chris Collins,

  • An article that explains how to implement MetalLB as a load balancer on Kubernetes on your Raspberry Pi at home.

Windows Server Containers in Red Hat OpenShift 4.4

Red Hat OpenShift Team

  • A Twitch video by Red Hat members explaining a Windows server container in Red Hat OpenShift 4.4. The usage of Operator and future roadmap are mentioned.

A guide to Terraform for Kubernetes beginners

Jessica Cherry,

  • An article explaining how to use Terraform for beginners of Kubernetes. The good point of Terraform is that you can import the existing environment to the state, and it explains the state management method of Terraform using Kubernetes cluster on Minikube as the environment.

How to run Keycloak in HA on Kubernetes

Ramiro Algozino, SIGHUP

  • An article that introduces important concepts to remember when creating and deploying a cluster of Keycloak that is IAM tool of OSS on Kubernetes with HA configuration.


A tool to sync images from one container registry to another

  • OSS tool “Sinker” GitHub page that motivates container images between registries.

Introduction to WebAssembly on Kubernetes with Krustlet

David McKay, InfluxData

  • An article that introduces “Krustlet” as a tool to run WebAssembly binaries in Kubernetes using Krustlet.

How to architect for Kubernetes: Part 1

Tomás Pinho

  • Part 1 of the article, which summarizes best practices for a quick and safe transition to Kubernetes. Part 2 does not seem to have appeared.
  • The author had the feeling that “Many people who would benefit from the Kubernetes implementation don’t know how to properly deploy, manage, and secure it.”

Presslabs is the First Managed WordPress Hosting Platform running on Kubernetes

Ioana Vasi, Presslabs

  • A case of migrating to Kubernetes using GKE of Presslabs.
  • “We are proud to say that to our knowledge, we are the first managed WordPress platform fully based on Kubernetes and we are working full speed to develop new and exciting features and offer our clients a seamless, highly-scalable hosting experience.”. I wonder what the observation method or standard they applied.

The world’s simplest Kubernetes dashboard: k1s

Daniel Weibel, ITNext

  • An article introducing “ k1s “ as “The world’s simplest Kubernetes dashboard”.
  • A minimal Kubernetes dashboard that allows you to monitor Kubernetes resources of any type in arbitrary Namespace (or all Namespaces) in real time.
  • It is implemented as 50 lines of Bash code.

Leverage PodSpec to customize the Fission runtime and builder pods

InfraCloud Team

  • An article that explains how to customize by using runtime and builder pod of “ Fission “ which is an OSS framework for Kubernetes podspec.
  • They recommend this article for those who are not familiar with the basic structure of Fission.

Minimum Viable Kubernetes

Emanuel Evans

  • An article that sets up and explains what a minimal “Kubernetes cluster” actually looks like.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Linux company SUSE outbids competitors for fast-growing start-up Rancher Labs

Jordan Novet, CNBC

  • News of the acquisition of Rancher Labs by SUSE. It provides information on trends and backgrounds such as containers and Kubernetes.

Kubernetes Operators Explained

Piotr Perzyna, Container Solutions

  • An article that carefully explains the operator of Kubernetes and touches on Prometheus Operator as a case study.

Can Kubernetes be an IT budget hero?

Kevin Casey, The Enterprisers Project

  • An article explains with 4 examples in the aspect of “Kubernetes helps IT teams increase efficiency, reliability, and consistency — which helps reap budget savings.”.

1. Creating a shared language across teams
2. Talent development and allocation
3. People can literally do more — without burning out
4. Fewer moving parts equals budget predictability

  • Personally, I think that the viewpoint of “saving budget” cannot be achieved without assuming that considerable scale and time are spent. Especially the third one, “People can literally do more — without burning out” fit on the case of where “The team’s knowledge and tools are in place to control Kubernetes, and there is already a certain number of people to follow the update.” in my opinion.

Scalability, with Wojciech Tyczynski

Adam Glick and Craig Box, Kubernetes Podcast from Google

Deploy HAProxy Ingress Controller from Rancher’s Apps Catalog

Nick Ramirez, HAProxy

  • An article that explains “How to deploy HAProxy Ingress Controller with Rancher app catalog” as the title on Rancher Labs web page.

Building a Multi-Tenant gRPC Development Platform with Ambassador and AWS EKS

Brian Annis, Hacker Noon

  • An article that explains “How to build a multi-tenant gRPC development platform using Ambassador and AWS EKS”. Ambassador appears frequently in this blog as well, so I’d like to secure some time to try it out.

LOTE #12: Daniel Mangum on Crossplane, building a PaaS, and Multi-Cluster Kubernetes

Ambassador Podcast

  • A transcript of the “Livin’ on the Edge” podcast. This week’s guest is Daniel Mangum (Software Engineer) from Upbound.
  • Introducing the OSF tool “Crossplane”, a sandbox project of CNCF.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Securing and Accelerating the Kubernetes CNI Data Plane with Project Antrea and NVIDIA Mellanox ConnectX SmartNICs
Antonin Bas, Maintainer of Project Antrea and Staff Engineer @VMware
Moshe Levi, Sr. Staff Engineer @NVIDIA
July 14, 2020 10:00 AM Pacific Time

Member Webinar: How Alibaba Extends K8s scheduler to support AI and big data workloads
Zhang Kai, Staff Engineer @Alibaba
Wang Qingcan, Senior Engineer @Alibaba
This webinar will be delivered in Chinese. July 15, 2020 10:00 AM China Standard Time

Member Webinar: Serving Millions of Customers with Cloud Native and DevSecOps
Chris Hollies, CTO, Oracle Practice @Capgemini
Akshai Parthasarathy, Principal Director, Cloud Native and DevOps @Oracle Cloud
July 15, 2020 7:00 AM Pacific Time

Member Webinar: Advancing image security and compliance through Container Image Encryption!
Brandon Lum, Senior Software Engineer @IBM
July 15, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes and storage. Kubernetes for storage. An overview.
Kiran Mova, Chief Architect at MayaData and core maintainer of OpenEBS @MayaData
July 16, 2020 10:00 AM Pacific Time

Member Webinar: Learn how to clean up your cloud-native “DevOps Dumping Ground”
Melissa Sussmann, Product Marketing Lead @Puppet
Kenaz Kwa Principal Product Manager @Puppet
July 17, 2020 10:00 AM Pacific Time

Project Webinar: Fluent Bit v1.5
Eduardo Silva, Principal Engineer @Arm Treasure Data
July 17, 2020 1:00 PM Pacific Time

Member Webinar: Kubernetes Security Anatomy and the Recently Disclosed CVEs
Gadi Naor, CTO & Co-Founder @Alcide
July 21, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes Secrets Management: Build Secure Apps Faster Without Secrets
Jody Hunt, Director of DevOps Security @CyberArk
July 22, 2020 7:00 AM Pacific Time

Member Webinar: Implementing Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Oleg Chunikhin, CTO @Kublr
July 22, 2020 1:00 PM Pacific Time

Member Webinar: Observability of multi-party computation with OpenTelemetry
Antoine Toulme, Engineering Manager @Splunk
Dave McAllister, Sr. Technical Evangelist @Splunk
July 23, 2020 10:00 AM Pacific Time

Member Webinar: One large cluster or lots of small ones? Pros, cons and when to apply each approach
Flavio Castelli, Distinguished Engineer @SUSE
July 24, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes Policies 101
Eran Leib, Founder, VP Product Management @Apolicy
Spenser Paul, Director of Sales, North America @DoiT International
July 28, 2020 10:00 AM Pacific Time

Member Webinar: GitOps Continuous Delivery with Argo and Codefresh
Dan Garfield, Chief Technology Evangelist @Codefresh
July 29, 2020 1:00 PM Pacific Time

Member Webinar: Cluster API — Yesterday, Today, Tomorrow
Saad Malik CTO & Co-Founder @Spectro Cloud
Jun Zhou Chief Architect @Spectro Cloud
July 30, 2020 10:00 AM Pacific Time

Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code
TiKV team
July 31, 2020 10:00 AM Pacific Time

Member Webinar: Comparing eBPF and Istio/Envoy for Monitoring Microservice Interactions
Roko Kruze, Solutions Engineer @Flowmill
Mike Cohen, Co-Founder and COO @Flowmill
Aug 4, 2020 10:00 AM Pacific Time

Member Webinar: Debugging your debugging tools; What to do when your service mesh goes down in production?
Neeraj Poddar, Co-founder and Chief Architect @Aspen Mesh
Aug 5, 2020 7:00 AM Pacific Time

Member Webinar: Making Data Work for Developers with Kubernetes & Cassandra
Chris Splinter, Sr. Product Manager — Developer Solutions @DataStax
Patrick McFadin, VP of Developer Relations @DataStax
Aug 5, 2020 1:00 PM Pacific Time

Member Webinar: Hardware for Kubernetes, Peeling Back the Layers
Erik Reidel, SVP Compute & Storage Solutions @ITRenew
Aug 11, 2020 10:00 AM Pacific Time

Project Webinar: Kubernetes 1.19
Kubernetes release team
Aug 28, 2020 10:00 AM Pacific Time

Member Webinar: Getting started with container runtime security using Falco
Loris Degioanni, CTO and Founder @Sysdig
Sept 2, 2020 1:00 PM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store