SRE / DevOps / Kubernetes Weekly Collection#3(Week 08)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #477 February 16th, 2020
SRE Weekly Issue #207 February 16th, 2020
KubeWeekly #204: February 21th, 2020

DEVOPS WEEKLY ISSUE #477 February 16th, 2020

A post on organisational friction, and getting things done where work crosses team and other boundaries. Lots of examples and a useful framing of this type of problem.

  • Using his experience and his colleague Dan Na’s presentation “organizational friction”, he talked about the meaning of miscommunication within the organization and the story of overcoming friction and achieving goals.

Microservices are one architectural approach, rather than the answer to all systems problems, at least according to this post on the tradeoffs and design decisions of choosing to adopt microservices.

  • The title is “Should I use microservices? — Considerations for when — and when not — to apply microservices in your organization.”.
  • An excerpt from Sam Newman’s report “What are Microservices?”.
  • The full version is available here.
  • Sam Newman is co-chair of the “O’Reilly Infrastructure & Operations Conference” in Santa Clara, California, June 15–18(at that moment). O’Reilly Infrastructure & Operations Conference was canceled and instead, O’Reilly will continue to invest in and grow O’Reilly online learning,
  • The author believes that the application of microservices has a myriad of challenging issues that need careful consideration.
  • It highlights the characteristics of organizations where microservices do not work well and those that work well.
  • He concluded that “A microservice architecture is one that can give you a lot of flexibility as you continue to evolve your system. That flexibility has a cost of course, but if you want to keep your options open regarding changes you might want to make in the future, it could be a price worth paying”.

An interesting discussion of the role of database migrations, looking at historical approaches and a new automated approach using GitHub Actions.

  • From The GitHub Blog, the title is “Automating MySQL schema migrations with GitHub Actions and more”.
  • GitHub performs an average of 2 MySQL schema migrations per day, and 6 on some days.
  • The tremendous toil that migration has given the database infrastructure team and the story of how to automate a manual process.
  • I am not in a position to adjust DB, but it’s interesting, so I’ll read it again.

Most things in Kubernetes go through the API Serverer. That makes the Kubernetes audit log, which records those requests, a powerful monitoring tool. This post explores what it contains and how you can use that data effectively.

  • The title is “How to monitor Kubernetes audit logs”.
  • It introduced how to leverage the power of Kubernetes audit logs for deep insight into your clusters.

A fantastically comprehensive look at what’s new in the latest Salt release. Far more than just a list of features, the post puts new capabilities in context and shows lots of examples.

  • The title is “What’s new in Salt 3000 Neon”.
  • This article is an unofficial summary of the new features in the Salt Neon release. Like Salt Fluorine, it started out as a series of tweets, mostly referring to new features.
  • Salt is a configuration management system and a distributed remote execution system. You can keep remote nodes in a defined state. For example, you can see that a particular package is installed and that a particular service is running.
  • Used as a distributed remote execution system to execute commands and query data at remote nodes. Query and execute commands on individual nodes or using any selection criteria.
  • If you want to read about other changes and deprecations (eg RAET DEPRECATED), I recommend checking out the official release notes and the new homemade changelog.
  • I didn’t know Salt at all, and it took me the most time to find out what it was.

A nice migration story, moving a CI/CD pipeline from Jenkins to Concourse. Observations about improvements and expectations as well as difficulties. The fly CLI tool looks nice.

  • The title is “We killed the butler: Replacing Jenkins with Concourse”.
  • A story that literally moved from Jenkins to Concourse.
  • They are trying to manage all infrastructure configurations with GitOps.

Infrastructure described with Terraform is still code, and applying programming techniques like refactoring to keep it maintainable is important. This post explains why and shows a few examples.

  • The title is “Refactoring Terraform, The Right Way”
  • This article is a recap of the slide at the end of the article.
  • The story began with the common idea of “I’ll just make it work now and figure out later how to improve it when I have some extra time to make it better” and the approach that follows that.
  • The above approaches have challenges and he couldn’t think of a way to make Terraform a bit more user friendly and easier to debug.
  • When he saw Yevgeniy Birkman’s blog “5 lessons learned from writing over 300,000 lines of infrastructure code” and found how to make Terraform more robust, clean, and friendlier, and above all, found how to gain the confidence you need to make the necessary changes or improvements.

Another post on writing maintainable Terraform code. Lots of examples and a set of rules around modules, data, interpolation, state and more.

  • The title is “Terraform Poka-Yokes — Writing Effective, Scalable, Dynamic, and Error-Resistant Terraform”.
  • At first glance, I thought “What is the word Poka-Yoke”? Actually, it is a Japanese term means “mistake-proofing” or “inadvertent error prevention”. It is intriguing
  • Some time ago, the author was impressed by reading an article describing IaC called the Infrastructure Application Pattern (I-A). A full six-part series of the article is available on Martin Atkins’ blog.
  • This article expands on the ideas in this article, looking at other possible benefits and implementations of the I-A pattern, and discusses the rules by which Poka-yoke can be created.

This post nicely summarises why Go has made such an impact on infrastructure and cloud-based applications, outlining several of the advantages of the runtime and toolchain in particular.

  • The title is “Go for Cloud”.
  • This article also covered the unique strengths of Go in cloud systems and the ambiguities that could be some gotchas for first-time users.

Devops Days New York is coming up on March 3rd and 4th with the usual mix of talks, ignites and open spaces. Interesting topics like lifecycle management, product management for operations teams, CI/CD pipeline sprawl and more. TIckets are available now and you can get a 15% discount with the code “devopsweekly”.

  • It is the introduction of the event Devops Days New York was coming up on March 3rd and 4th at that moment.

Preflight is a tool for verifying a Kubernetes cluster is configured correctly using an opinionated set of Open Policy Agent policies.

  • The GitHub page of “Preflight”, a tool that verifies that Kubernetes clusters are configured correctly using OPA (Open Policy Agent) policies.
  • The repository hosts the agent part of Preflight. It sends data to the Preflight SaaS platform.

Host extraordinaire, Benton Rochester, talks with Gene Kim about DevOps and his excellent new book, The Unicorn Project. Don’t miss this highly-anticipated episode of Ship Happens, the Splunk + VictorOps podcast:

  • I skipped this one, because I covered it last week.

SRE Weekly Issue #207 February 17th, 2020

You see pilot error, I see normal work

The scenario: a seemingly botched landing, a finding of human error, and retraining for the errant pilots. The author recasts the entire incident in a much more realistic light that shows that the pilots’ actions were perfectly reasonable.

Robyn Ironside — Safety Differently

  • An article in which I analyzed an incident that was concluded as a pilot’s mistake, as the Jetstar pilot had forgotten the landing gear on landing.

Running servers (and services) well is not trivial

Just exactly what would it take to (reliably) run your own git server internally?

Chris Siebenmann

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages

In this two part series, The Morning paper takes on John Allspaw’s master’s thesis from Lund University. Here’s part two.

Adrian Colyer — The Morning Paper (summary)
John Allspaw — Lund University (original paper)

  • A series of a short summary every weekday of an important, influential, topical or otherwise interesting paper in the field of computer science by Adrian Colyer.
  • Part 2 is also a link to Part 2 above.
  • From the Master’s thesis published by John Allspaw in 2015.

Team Structure for Software Reliability within your Organization

The section toward the end under the heading “Things need to get worse before they get better.” especially resonated with me.

Hannah Culver — Blameless

  • He was inspired by Will Larson’s blog post “Modeling Reliability”. and he tried to abstract mathematical formulas for a larger audience and intuitively understand team building for reliability.

Music in Resilience: The Practice of Practice

Incident response and improvisational music share a lot in common.

Matt Davis — Verica

  • There is a comment that “disability support and improvised music have a lot in common”, but I recalled “Referred to as one of the creators of Kubernetes,” Joe Beda’s slide, “Core Kubernetes: Jazz Improv over Orchestration”, and his presentation at Kubernetes Meetup Tokyo (I recommend it because the interpreter from English to Japanese was so wonderful! I thanked him directly).

KubeWeekly #204: February 21, 2020

The Headlines

Editor’s pick of the highlights from the past week.

Why I Contribute to the Open Source Community — and You Should Too

Marky Jackson

Marky Johnson shares his journey to becoming a passionate, open-source contributor that all started with a rocket scientist. His post reminds us that open source is welcome to all and invites you to join the fun.

  • The writer is working as a senior software engineer at Sysdig.
  • He is working on OSS as a contributor to Jenkins Prometheus plugins.
  • While mentoring and sponsoring next-generation engineers, he also tells readers how wonderful they are to contribute, “How do you make a difference in the world today?”
  • It is interesting to write while looking back on my half-life, listing the company name that came across how I came to the current values ​​and technological capabilities.

Roaring Elephant podcast: KubeCon + Cloud NativeCon EU preview

Not a regular news episode this time. Instead, we are starting our KubeCon/CloudNativeCon Amsterdam coverage and have co-chairs Vicki Cheung and Constance Caramanolis on as a guest to tell us all about these conferences. If you’ve never attended one of these, this discussion will give a good idea on what to expect and for seasoned attendees, there is a little bit of a behind-the-scenes look at how these events take form.

  • Co-chair Vicki Cheung and Constance Caramanolis were guests as a preview of “KubeCon + Cloud NativeCon EU” on the Roaring Elephant podcast.
  • There are a few stories from the host side of the events.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Open Policy Agent: Microservices Authorization Simplified

Gaurav Chaware, Infracloud

  • The author often runs into problems with authentication and authorization when implementing microservice development.
  • This article explores how OPA (Open Policy Agent) can help simplify authorization.

Deploying Envoy and Kafka to collect broker-level metrics

Adam Kotwasinski, Workday

  • It introduced two deployment methods of Envoy+Kafka.
  • Routing all Kafka-related traffic through Envoy (including internal cluster communication),
  • Routing only Kafka-client traffic through Envoy.

Introduction to SPIFFE and SPIRE Projects (Lightboard)

Evan Gilman, maintainer SPIFFE and SPIRE

  • Evan Gilman from Scytale gives an overview of SPIFFE and SPIRE on a YouTube video on a lightboard (a transparent board that speakers use to write while pening).
  • Two episodes ago, I picked the news of HPE acquisition of Scylate.

Find an optimal set of nodes for a Kubernetes cluster


  • It is an introduction of the Open Core tool, Kubecost.
  • To install it, enter your email address on this page to register and use Helm or a flat manifest.

GKE Node Pool Location and Surge upgrades GA

Two new Google Kubernetes Engine (GKE) features are Generally Available. Node pool location allows GKE users to specify the location for node pools independently from the location of clusters.

Surge upgrades allow users to specify the number of extra (surge) nodes and number of accepted unavailable nodes during an upgrade, improving reliability and reducing disruption to workloads.

  • It Introduced new features such as GKE’s GA Node Pool Location and Surge upgrades.
  • I think both are important features. I want to try it. It looks like both of them are supported by only CLI.

The Editorial

Articles, announcements, and more that give you a high-level overview of challenges and features.

Architecting for Multicluster Kubernetes

Thomas Rampelberg, Linkerd

  • This article outlines the minimum requirements for a multi-cluster solution that enhances reliability, security, and observability of cross-cluster traffic.
  • Subsequent blog posts will cover some implementation options. (2/22 It seems that it has not published yet)
  • The following story, “Multicluster Kubernetes with Service Mirroring” was published at 2/25.

Five Surprising Ways Enterprises Are Putting Kubernetes To Work

Murli Thirumale, Portworx (published in Forbes)

  • The author introduced five surprising (at least for non-engineers) ways enterprises are leveraging Kubernetes and providing insight as to how the platform might evolve in the near future.

1. On-prem use as well as the cloud2. Deploying stateful apps as easily as stateless apps
3. Full control over app lifecycle management
4. A new control plane for infrastructure (in the data center and cloud)
5. A rapid enabler for AI

eBPF and Falco, with Leonardo Di Donato

Adam Glick and Craig Box, Kubernetes Podcast from Google

  • The episode of Weekly Kubernetes Podcast hosted by Community members belongs to Google.
  • The Guest is Leonardo Di Donato, Open Source engineer of Sysdig.
  • Leonard is working full time on the Falco project
  • He talked about EBPF (extended Berkeley Packet Filter) architecture used for listening to the Linux kernel, past and future usage, what will come in Falco in the future? And so on.
  • News of the week includes a lot of news on KubeWeek, including last week’s, but there are many things that are not.
  • I thought I’d bring the whole links of “News of the week”, but I thought it was overkill for something I am not sure it is needed, so I stopped. It might be better to take the ones you are interested in individually from the Kubernetes Podcast links above.

Security concerns hampering the adoption of containers and Kubernetes

Jonathan Grieg, TechRepublic

  • The explanation began with the subtitle, “According to a StackRox study, more than 90% of respondents have experienced a security incident in deployments in the last year.”
  • The first half of the report expresses concerns and opinions based on the survey contents as he said “The findings in this survey of 541 respondents make clear that organizations are putting at risk the core benefit of faster application development and release by not ensuring their cloud-native assets are built, deployed, and running securely”.
  • Finally, he said that the competition is becoming more intense between AWS/Azure/GCP as three container providers, especially between Azure and GCP, which are competing for second place.

Google Identifies Kubernetes-Ready Storage for Anthos

Mike Melanson, The New Stack

Managed Kubernetes Services Make K8s Simple for Platform Teams and App Developers

Emily Omier, The New Stack

  • This article is part of a three-month series examining the challenges of Kubernetes in 2020
  • In January, the author’s team tested the Kubernetes developer experience.
  • It explained the background behind the need for managed Kubernetes, and provided products that make it easy for the platform team and app developers to see the complexity of Kubernetes.

Linux Foundation study throws the open-source sustainability debate into question

Matt Asay, TechRepublic

  • “Open source developers tend to be well-compensated” was drawn as one possible conclusion from a recent Linux Foundation report (PDF).
  • The report found that over 75% of the top maintainers of the 200 most active open source projects were paid for open source full-time or part-time work.
  • The Linux Foundation is rolling out a more comprehensive survey of thousands of developers to get more details.
  • He concluded this article “I’m equally sure that cash isn’t the only thing that matters here, and we need to do a lot more to make open source development something as emotionally sustainable as it (likely is) financially sustainable.”.

Webinar Registration

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Helm Security — a Look Below Deck
Matt Farina, Helm Maintainer @Samsung SDS
Hayley Denbraver, Developer Advocate @Snyk
Raghavan “Rags” Srinivas, Lead Container Developer Advocate @Snyk
Member webinar
Feb 25, 2020 10:00 AM Pacific Time

Managing Observability in Modern Apps — Microservices
Ran Ribenzaft, co-founder of and CTO @ Epsagon
Member webinar
Feb 26, 2020 10:00 AM Pacific Time

From Notebook to Kubeflow Pipelines with MiniKF & Kale
Vangelis Koukis, CTO & Founder @Arrikto
Stefano Fioravanzo, Software Engineer @Arrikto
Member webinar
Feb 27, 2020 9:00 AM Pacific Time

Kubernetes Security Best Practices for DevOps
Frédéric Harper, Senior Developer Advocate @DigitalOcean
Member webinar
March 3, 2020 10:00 AM Pacific Time

What’s New in Linkerd 2.7
Linkerd team
Project webinar
March 6, 2020 10:00 AM Pacific Time

Kubernetes Security Best Practices for DevOps
Connor Gorman, Principal Engineer @StackRox
Member webinar
March 11, 2020 10:00 AM Pacific Time

Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
March 13, 2020 10:00 AM Pacific Time

How to migrate a MySQL Database to Vitess
Liz van Dijk,@PlanetScale
Project webinar
March 20, 2020 10:00 AM Pacific Time

Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store