SRE / DevOps / Kubernetes Weekly Collection#30(Week 35)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #504 August 23rd, 2020
SRE Weekly Issue #232 August 23rd, 2020
KubeWeekly #230 August 28th, 2020

DEVOPS WEEKLY ISSUE #504 August 23rd, 2020


A great post on the changing role of operations. Some good tips for those wondering what modern ops looks like, with tips on vendor management, outsourcing infrastructure and the importance of understanding sociotechnical systems.

  • The title is “The Future of Ops Jobs”.
  • Ops’ Future Jobs are described by touching on the following three Changes afoot.
    ○ From monolith to microservices
    ○ From monitoring to observability
    ○ From magic autoinstrumentation to instrumenting with intent
  • The author wishes good luck to those who “If your heart truly beats for working on infrastructure problems by joining an infrastructure company, as such issues are increasing.” Otherwise, she recommends building a system that allows a team of engineers to ship software that creates core business value from four perspectives: I’ve heard a lot about vendor control in other industries, but if it can be sublimated into engineering, it certainly has intervening value.
    ○ Vendor engineering
    ○ Product engineering
    ○ Sociotechnical systems engineering
    ○ Managing the portfolio of technical investments.

A good introduction to NAT networks, for anyone wanting to understand this area of networking better. Good diagrams and examples and lots of details.

  • The title is “How NAT traversal works”.
  • An article that describes various issues/protocols/firewall components, starting from simple peer-to-peer connection, with NAT as the theme.
  • Since there is a lot of volume, I skipped details. I will read it again.

Metrics are used for lots of different purposes, including reporting to the top of an organisation. This post explores engineering KPIs for board room conversations.

  • The title is “How to Choose Software Development KPIs for Your Board Deck”.
  • For CTOs, KPIs for software development prepared for Board Meeting and points for productive dialogue on the spot are explained.
    ○ Start with Engineering Success Metrics
    ○ Drill Down with Revealing Engineering KPIs
    ○ Put Engineering Metrics in Conversation
    ○ Make Board Meetings Work for You

Ever wanted to ensure that messages between services are kept in order, with a retry mechanism for any lost messages? This post describes a specific pattern, but is also part of a set of articles on distributed computing patterns that’s worth exploring.

  • The title is “Single Socket Channel”.
  • The blog by Martin Fowler, a software development author, speaker and critic . It explains the problems and solutions that Single Socket Channel solves in the title.
  • In the commentary, I’ve already linked to the themes explained in his blog in the past, which is very good. You can dig deep into web-related technologies.

Incident reviews are increasingly common but often hard to do well. This video and detailed transcript has various tips for improving the process.

  • The title is “Improving Postmortems from Chores to Masterclass with Paul Osman”.
  • I will skip it because it was taken up in SRE Weekly Issue #231 last week.

An ambitious idea for a new journal for Systems research. Definitely relevant to the interests of some readers of Devops Weekly I think.

  • The title is “A new journal for systems research”.
  • An article that introduces and explains the Journal of Systems Research ( as an improvement measure by listing the current issues of the system research review process and the open model.

Pulumi, the Infrastructure as Code tool, now supports using Open Policy Agent to validate the resulting resources. This post explores why and how.

  • The title is “Authoring CrossGuard Policy with Open Policy Agent (OPA)”.
  • An article described by Pulumi after OPA (Open Policy Agent) Rego language support was added to the code framework as Pulumi’s CrossGuard policy.

Even if you’re not writing applications in Java, it’s often useful to have some knowledge of how logging works as you’ll probably end up running at least some Java applications. These posts provide a solid foundation.

  • The title is “Java Logging Tutorial: Basic Concepts to Help You Get Started(Linked above)” and “Java Logging Best Practices: 10+ Tips You Should Know to Get the Most Out of Your Logs
  • The first article focuses on how to properly configure logging for your code to avoid known logging mistakes in Java and covers the following:
    ○ Logging abstraction layers for Java
    ○ Out of the box Java logging capabilities
    ○ Java logging libraries, their configuration, and usage
    ○ Logging the important information
    ○ Log centralization solutions.
  • The second article discusses 14 best practices for Java logging:
  1. Use a Standard Logging Library
  2. Select Your Appenders Wisely
  3. Use Meaningful Messages
  4. Logging Java Stack Traces
  5. Logging Java Exceptions
  6. Use Appropriate Log Level
  7. Log in JSON
  8. Keep the Log Structure Consistent
  9. Add Context to Your Logs
  10. Java Logging in Containers
  11. Don’t Log Too Much or Too Little
  12. Keep the Audience in Mind
  13. Avoid Logging Sensitive Information
  14. Use a Log Management Solution to Centralize & Monitor Java Logs


Tags are critical to managing AWS resources at scale. Awstaghelper provides a command line tool to ease adding and managing tags to and from CSV files across the wide range of AWS resources.

  • A GitHub page for the OSS tool “Aws tag helper” that tags hundreds of AWS resources with a few commands.

The GitOps Toolkit is a set of composable APIs and specialized tools that can be used to build a Continuous Delivery platform on top of Kubernetes. They should provide the underpinnings for the v2 of Flux, but could also be used to build other interesting high-level tools that take the same control loop approach.

  • The io page of the “GitOps Toolkit”, a set of configurable APIs and specialized tools that you can use to build a continuous delivery platform on top of Kubernetes.

Kip is a Virtual Kubelet provider that allows a Kubernetes cluster to transparently launch pods onto their own cloud instances. Handy if you require additional workload isolation.

  • The Kip(Kubernetes Cloud Instance Provider) GitHub page for the Virtual Kubelet provider that allows Kubernetes clusters to transparently launch pods into their own cloud instances.

SRE Weekly Issue #232 August 23rd, 2020


Incident updates, interruptions and the 30 minute window

An engineer’s observation of a really effective Incident Command pattern.

Dean Wilson

  • An article that analyzes and explains that the incident response system and the internal clock of 30 minutes were used as the standard and were able to respond effectively.

Thoughts on STAMP

Here’s Lorin Hochstein’s take on the STAMP (Systems-Theoretic Accident Model and Processes) workshop he attended recently.

Lorin Hochstein

HRO and RE: a pragmatic perspective

What’s the difference between Resilience Engineering and High Reliability Organizations? This paper (and excellent summary) explains.

Torgeir Haavik, Stian Antonsen, Ragnar Rosness, and Andrew Hale (original paper)

Thai Wood — Resilience Roundup (summary)

The Future of Ops Jobs

This one focuses on what I feel are really important parts of SRE, taken from the article’s subheadings:

● Vendor engineering
● Product engineering
● Sociotechnical systems engineering
● Managing the portfolio of technical investments

Charity Majors — Honeycomb

  • I will skip it because it is covered in DEVOPS WEEKLY ISSUE #504 above.

Outage report 7 July 2020 — PythonAnywhere

Now that’s a for-serious incident report. Nice one, folks! This is an interesting case of theory-meets-reality for disaster planning.

giles — PythonAnywhere

  • Python Anywher ‘s report of major outages since July 2017 . Storage system failure is the cause.


KubeWeekly #230 August 28th

The Headlines

Editor’s pick of the highlights from the past week.

Kubernetes 1.19 released

Congratulations to the release team on getting Kubernetes 1.19 out the door. This release is all about extra time: the timelines were adjusted due to world events, and it will be the first to be supported for 12 months. This should allow an extra 30% of Kubernetes users to remain on a supported version on their regular upgrade cadence. The release includes 33 enhancements, including Ingress finally going to GA. Check out an interview with the release manager Taylor Dolezal on this week’s Kubernetes Podcast to learn more.

  • Kubernetes 1.19 release article. COVID-19, George Floyd protests and other events have changed the normal release cycle. There are many changes such as the support period being changed to one year. Much is said in more detail in the Kubernetes Podcast linked as an interview ~ above.
  • The interview link above is a Kubernetes Podcast by a Google employee. The current co-hosts are Craig Box and Adam Glick.
  • The guest is Hashicorp's senior developer advocate, Kubernetes 1.19 release lead, and CNCF Ambassador Taylor Dolezal .
  • A joke about "Communication is difficult as DNS '' came as a guest at the Kubenetes Podcast was talking about OSS.
  • The topics of interest in News of the week are:
    k3s to join the CNCF Sandbox
    Serverless Framework Knative component
    Palinurus, from Mailchannels
    The Kubernetes Handbook by Farhan Hasin Chowdhury

A Look Back at our FIRST KubeCon + CloudNativeCon Virtual Conference

Priyanka Sharma, CNCF

Priyanka Sharma recaps the first virtual KubeCon + CloudNativeCon and the event’s success thanks to our amazing community of doers — builders, operators and advocates! She writes, “we are so thrilled that the cloud native community came together with hope and positivity to make this a truly community-driven event we will remember for a long time. We may not have been able to meet in person this year but we are indomitable!” Read the recap blog here.

  • A recap blog of KubeCon + CloudNativeCon Virtual by CNCF staff.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: Modern Software Development Pipeline: A Security Reference Architecture

Vinay Venkataraghavan, Cloud CTO, Prisma Cloud @Palo Alto Networks

  • It explained with the following points.
  1. Survey the typical deployment pipeline and the threats that we should mitigate
  2. Propose a reference architecture for embedding security controls
  3. Conclude with some practical examples of security tools that can be embedded across the software delivery lifecycle

CNCF Member Webinar: MLOps automation with Git Based CI/CD for ML

Yaron Haviv, Co-Founder and CTO @Iguazio

  • It describes how the ML pipeline works, its main challenges, and the various steps involved in creating models and data products (data collection, preparation, training/AutoML, validation, model deployment, drift monitoring, etc.).
  • It demonstrates the following methods that greatly simplify and automate the development and deployment process:
  1. Maximize the efficiency and collaboration between the various teams
  2. Harness Git review processes to evaluate models
  3. Abstract away the complexity of Kubernetes and DevOps.

CNCF Member Webinar: Local Development in The Age of Kubernetes

Misha Gusarov, Software Architect @Ridge Cloud

  • It describes how to regain interactivity by making application development and debugging as easy as possible. It takes a way to explore the Kubernetes components and recreate their functionality in a local development environment.

CNCF Member Webinar: How to migrate databases into Kubernetes?

Alex Chircop, CEO & Founder @StorageOS and Ferran Castell, Product Reliability Engineer @StorageOS

  • The following methods are explained. For those looking to migrate stateful workloads like databases on Kubernetes.
    ○ How to deploy databases in production in Kubernetes
    ○ How to implement automatic failover with high availability
    ○ How to migrate a database into a Kubernetes cluster
    ○ How to build a database as a service with Kubernetes

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Introducing Hierarchical Namespaces

Adrian Ludwin, Google

  • An Introductory article on “Hierarchical Namespaces” by A new concept developed by the Kubernetes Working Group for Multi-Tenancy(wg-multitenancy).
  • Based on the concept of Ownership across Namespaces, the following two behaviors are added. Policy inheritance and resource delegation creation.
  1. Policy inheritance: if one namespace is a child of another, policy objects such as RBAC RoleBindings are copied from the parent to the child.
  2. Delegated creation: you usually need cluster-level privileges to create a namespace, but hierarchical namespaces adds an alternative: subnamespaces, which can be manipulated using only limited permissions in the parent namespace.

Moving Forward From Beta

Tim Bannister, The Scale Factory

Design Considerations at the Edge of the ServiceMesh

Raffaele Spazzoli, Trevor Box, and Joshua Mathianas at Red Hat

  • An article that introduces a series of design patterns for traffic to and from the mesh.

Zero-Downtime Kubernetes Deployments

Oliver Leaver-Smith, Sky Betting & Gaming

  • It describes Core Customer’s work over the last few months to migrate their OIDC/OAuth2 identity services from a tactical container platform to an on-premises Kubernetes cluster and how to deploy Kubernetes with no downtime.

Google chooses Cilium for GKE networking

Thomas Graf, Isovalent

  • article. Following the announcement of GCP that GKE’s Dataplane V2 will use Cilium and eBPF, the behind-the-scenes explanation of this result is explained.

ArgoCD and Tekton: Match made in Kubernetes heaven

Burr Sutter and Siamak Sadeghianfar, Red Hat

  • A Webinar video on Twitch by the Red Hat Openshift team.
  • I want to try something that I want to deploy by myself with CI/CD.

An introduction to installing Prometheus with Minikube

Shashank Nandishwar Hegde, Red Hat

  • It describes the basic concepts of Prometheus and how to install it on minikube. In the next article, it’s going to explain application monitoring.

How To Manage Your Kubernetes Configurations with Kustomize


  • The following three points explain the Kubernetes configuration management method using Kustomize.
  1. Build a small web application and then use Kustomize to manage your configuration sprawl
  2. Deploy your app to development and production environments with different configurations
  3. Layer these variable configurations using Kustomize’s bases and overlays so that your code is easier to read and thus easier to maintain

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Terrascan Leverages OPA to Make Policy as Code Extensible

Cesar Rodriguez, Accurics

  • Prior to provisioning cloud-native infrastructure, it explained the history of Terrascan, an OSS tool that detects compliance and security violations across the infrastructure as code to mitigate risk, replacing regular expression-based rules with OPA engines, etc.
  • Click here for Terrascan’s GitHub page . Click here for documentation.

Kubernetes engineers keep your favorite software running

Megan Friedman, The Keyword

  • An article interviewing three engineers(Michelle Au, Janet Kuo and Purvi Desai) who have contributed to three GKE and Kubernetes in commemoration of GKE’s 5th anniversary on Google’s blog “The Keyword”.
  • GKE, Kubernetes, favorite customer cases, advice on developers coming into the world, etc. are posted.

Looking ahead as GKE, the original managed Kubernetes, turns 5

Chen Goldberg and Drew Bradstock, Google Cloud

  • An article on GCP’s webpage. Thanks to GKE for its fifth anniversary and the launch of Virtual KubeCon, they thank the community for making Kubernetes such an industry standard for managing containerized applications.
  • For the future, they share the following five ways to continue our efforts to make GKE the best place to run Kubernetes.
  1. Leaving no app behind
  2. Saving money with optimal price-to-performance by default
  3. Container-native networking: no more square pegs in round holes
  4. Bringing BeyondProd to containerized apps
  5. Democratizing access to learning Kubernetes

KubeCon EU: Accurics, Snyk Release Tools to Secure Infrastructure-as-Code Deployments

Joab Jackson, The New Stack

  • The New Stack’s article. It explains that Accurics’s OSS “Terrascan” and Snyk ‘s Snyk IaC, which were taken up above, were released for the KubeCon EU.

Use Virtual Clusters to Tame Sprawl in Kubernetes

Emily Omier, Nirmata

  • An article on Nirmata’s webpage. Starting from the phrase that “Now as more organizations adopt Kubernetes and start to struggle with best practice enforcement as well as the management and resource utilization problems related to cluster sprawl, they are starting to apply the same virtualization techniques to clusters.”, it explained how to use Virtual Cluster and connected to their service introduction.

Complexity: Your Day 2 Enemy

Emily Omier, Nirmata

  • Another article on Nirmata’s webpage. It describes the complexity of Kubernetes.
  • They conclude with “Organizations should focus on both minimizing Kubernetes inherent complexity by ensuring consistent configurations and consistent application design across clusters while also using tools that simplify the developer and operator experience. “ and it connected their service introduction as it helps organizations tame complexity at both the deployment and operations stage, so that Day 2 operations are as simple as possible.

JaegerTracing announces v1.19 release

  • The jaeger v.19 release page on GitHub.

Simplify Edge Networking for Different Kubernetes Providers

Noah Krause, ITNext

  • An introductory article of K8s Initializer from Ambassador Labs . Tools to provide bootstrap networking, Ingress, CI/CD, observability for the new Kubernetes cluster.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Running the next generation of cloud-native applications using Open Application Model (OAM)
Ryan Zhang, Staff Software Engineer @Alibaba Cloud
Sept 3, 2020 10:00 AM Pacific Time

Member Webinar: Arm Developer Experience Spanning Cloud, 5G and IoT
Darragh Grealish, Co-Founder @56K.Cloud
Marc Meunier, Sr. Manager, SW Ecosystem Development @Arm
Sept 8, 2020 10:00 AM Pacific Time

Member Webinar: Building a Cloud-Native Technology Stack that Supports Full Cycle Development
Daniel Bryant, Product Architect @Datawire
Sept 9, 2020 7:00 AM Pacific Time

Member Webinar: Highly scalable SaaS Apps on Kubernetes: Real Life Case Studies
Ram Kailasanathan, Senior Director Product Management @Oracle
Sept 9, 2020 1:00 PM Pacific Time

Member Webinar: Kubernetes and Networks: why is this so dang hard?
Tim Hockin, Principal Software Engineer @Google
Sept 10, 2020 10:00 AM Pacific Time

Member Webinar: Achieving Least Privilege Access in Kubernetes
Eran Leib Co-Founder and VP Product Management @Apolicy
Gregg Ogden Senior Product Marketing Manager @Aqua Security
Sept 11, 2020 10:00 AM Pacific Time

Ambassador Webinar: Hybrid Serverless Development using Quarkus and Kubernetes
Daniel Oh, Principal Technical Marketing Manager @RedHat and CNCF Ambassador
Sept 11, 2020 1:00 PM Pacific Time

Member Webinar: ChubaoFS Best Practices
Wei Ding, Staff Engineer
Sept 15, 2020 10:00 AM Pacific Time

Member Webinar: How To Run Kubernetes Securely and Efficiently
Joe Pelletier, VP, Products Fairwinds @Fairwinds
Robert Brennan, Director, Open Source @Fairwinds
Sept 16, 2020 7:00 AM Pacific Time

Member Webinar: Effective Kubernetes Onboarding
Kathleen Juell, Developer, DODX @DigitalOcean
Sept 16, 2020 1:00 PM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store