SRE / DevOps / Kubernetes Weekly Collection#91(Week 43, 2021)

Yoshiki Fujiwara
11 min readOct 31, 2021
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #565 October 24th, 2021

News

KubeCon finished up in LA a week and a bit ago, and we have several posts this week recapping the event, with lots of links, observations and some opinions.

  • It covers three articles on recap of KubeCon + CloudNativeCon NA 2021.
  • The title of the first article in the above link is “KubeCon NA 2021 Key Takeaways: DevX, Security, and Community”. The explanation is based on the tweets of the author and other participants.
  • The title of the second article is “KubeCon 2021 Los Angeles Wrapup”. Looking back with its tweets.
  • The title of the third article is “KubeCon 2021 Top 3 Announcements: APIClarity, HashiCorp Waypoint, and Dell EMC CSM”. The author introduces three new products that caught its eye due to their high potential of pushing back roadblocks that are currently slowing down application modernization.

An insightful post on the sometimes hard-to-define distinction between application and infrastructure. A static/dynamic linking analogy, how the Kubernetes API and Crossplane fit in, and the potential for a new type of marketplace for applications.

  • The title is “INFRASTRUCTURE IN YOUR SOFTWARE PACKAGES”.
  • By detailing the current situation and assessing how software delivery has evolved over time, it analyzes what the future of shipping infrastructure will look like alongside software.

Game servers are a super interesting scaling challenge. This post, about recent outstages for a large game, goes into some great operational, data storage and architecture details.

  • The title is “Diablo II: Resurrected Outages: An explanation, how we’ve been working on it, and how we’re moving forward”.
  • Since the launch of “Diablo II: Resurrected”, the causes of outages that have occurred on multiple server issues and the steps taken by the team in charge to provide some transparency around. It also provides insight into how we’re moving forward.

A look at how one team is evolving a large NFS file storage setup towards something that is easier to scale horizontally and automatically.

  • The title is “Iterating on how we do NFS at Wikimedia Cloud Services”.
  • As the Editor commented above, Wikimedia’s cloud services team reviewed how to run NFS and shared the improvements.

More deep internet networking insights, this time looking under the hood about what makes a valid hostname. It’s worse than you think.

  • The title is “What’s in a hostname?”.
  • It dives deep about the host name while matching it with the relevant RFCs.

A good introduction to the extensibility benefits of Kubernetes, looking at the high-level API, custom resources and the operator pattern.

  • The title is “Exploring Kubernetes Operator Pattern”.
  • Take a closer look at the Operators pattern and use as many images as possible to show which Kubernetes parts are involved in the implementation of the Operator and why the Operator feels like “first-class Kubernetes citizens”.
  • It explains that the Kubernetes API is probably the main driver of Kubernetes extensibility.

A post on introducing a production readiness review process, in particular in smaller teams.

Tools

hcltm is a tool for describing a thread model in HCL, and then generating various outputs from it including markdown documents and data flow diagrams.

  • The GitHub page of “hcltm” that provides a DevOps-first approach for documenting system threat models, focusing on the following targets:
    ○ Simple text-file format
    ○ Simple cli-driven user experience
    ○ Integration into version control systems (VCS)

Snowcat is a tool that gathers and analyzes the configuration of an Istio cluster and audits it for potential violations of security best practices.

  • As mentioned above, the GitHub page of “Snowcat”, a tool that collects and analyzes the configuration of Istio clusters and audits the possibility of violating security best practices.
  • Click here for the GitHub page.

SRE Weekly Issue #293 October 24th, 2021

Articles

The Downside of Hospitals Becoming “Highly Reliable”

It’s one thing to say you accept call-outs of unsafe situations — it’s another to actually do it. This cardiac surgeon shares what it’s like when high reliability organizations get it wrong.

Robert Poston, MD

  • In the hospital, the highly reliable organization (HRO) said, “A lack of transparency and passion leaves them with a series of well packaged ideas that end up looking like high reliability but never able to operate like one.“.
  • Article dated November 6, 2019.

Diablo II: Resurrected Outages: An explanation, how we’ve been working on it, and how we’re moving forward

The game has been a victim of its own success, and the developers have had to put in quite a lot of work to deal with the load.

PezRadar — Blizzard

  • Since it is covered in DEVOPS WEEKLY ISSUE #565 above, I will skip it.

An Introduction to Incident Response Roles

This includes some lesser-known roles like Social Media Lead, Legal/Compliance Lead, and Partner Lead.

JJ Tang — Rootly

This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

  • The following points explain how to define an incident response role in order to build a team that works as effectively and efficiently as possible.
    ○ What is an incident response team?
    ○ Structuring incident response roles
    ○ Other potential incident response roles
    ○ Conclusion: The best incident response team is a flexible team

Postmortem Pitfalls

There are a couple of great sections in this article, including “blameless” retrospectives that aren’t actually blameless, and being judicious in which remediation actions you take.

Chris Evans — incident.io

  • As the title suggests, the following points explain the pitfalls of post-mortem.
    ○ When blameless postmortems actually aren’t
    ○ Incidents are always going to happen again
    ○ Take time before you commit to all the actions
    ○ Incidents as a process, not an artifact

The danger of hidden functional roles

I love the idea that chaos monkey could actually be propping your infrastructure up. Oops.

Lorin Hochstein

  • The story of the introduction, which unintentionally plays the role of a family alarm clock, and how to connect Chaos Monkey in the latter half are good. I had never thought of the possibility that Chaos Monkey swapped instances before the problem occurred by terminating the instance.

What’s in a hostname?

I have to say, I’m really liking this DNS series.

Jan Schaumann

  • Since it is covered in DEVOPS WEEKLY ISSUE 565 above, I will skip it.

Crew member yelled ‘cold gun’ as he handed Alec Baldwin prop weapon, court document shows

What? Why the heck am I including this here?

First, let’s all keep in mind that this situation is still very much unfolding, and not much is concretely known about what happened. It’s also emotionally fraught, especially for the victims and their families, and my heart goes out to them.

The thing that caught my eye about this article is that this looks like a classic complex system failure. There’s so much at play that led to this horrible accident, as outlined in this article and others, like this one (Julia Conley, Salon).

Aya Elamroussi, Chloe Melas and Claudia Dominguez — CNN

  • At first glance, I thought, “Why is this article?” As mentioned in the Editor’s comment above, this is taken up because it looks like a classic complex system failure.

Outages

KubeWeekly #281 October 29th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Kubernetes Podcast from Google: Jasmine James, KubeCon + CloudNativeCon co-chair

Jasmine James is an Engineering Manager within the Engineering Effectiveness organization at Twitter, focused on their internal developer experience. She was also the co-chair of the recent KubeCon + CloudNativeCon. Jasmine talks about the events she’s led and the ones to come, and her feelings about being in a room in front of other people — up to 3,000 of them — for the first time in a long while.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Securing your workload communications with Open Service Mesh

Phillip Gibson, Microsoft

  • An approximately 46-minute session that introduces the latest integrations and techniques for enhancing workload communication using Open Service Mesh.

Introducing Kubescape — open-source tool to test Kubernetes deployment

Amir Kaushansky, ARMO

  • An approximately 50-minute session that explains how to operate Kubescape , supported frameworks, key features, and CI/CD integration.

How to design a multi-cloud deployment

Dave Blakely, Snapt

  • An approximately 40-minute session that explains the purpose of migrating to multi-cloud, how to select a cloud provider, how to deploy to multi-cloud, and how to keep multi-cloud secure.

Project Calico network policies

Nigel Douglas, Tiger

  • An approximately 41-minute session that explains the content of the title with the following points.
    ○ How does Project Calico enable network policies in K8s?
    ○ How to implement basics?
    ○ Creating and managing policies in your clusters

Understanding GitOps usecases

Abubakar Siddiq Ango, Gitlab

  • An approximately 30-minute session explaining GitOps, its use cases, and if/when you need GitOps.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

What you need to know about Kubernetes Network Policy

Mike Calizo, Red Hat

  • Kubernetes’ Network Policy is explained with the following points with a description example of YAML.
    ○ The NetworkPolicy concept
    ○ Applying a network policy
    ○ NetworkPolicy limitations
    ○ Summary

The life of an API gateway request (part 1)

Enrique García Cota, Kong

  • Part 1 of an article in a series that discusses how Kong Gateway handles requests by breaking the abstraction space into four different layers. About 13 minutes of video is embedded.
  1. Infrastructure
  2. Nodes
  3. Phases
  4. Plugins

Optimizing Kubernetes applications with Kubecost and Spinnaker

Alex Thilen, Kubecost

  • The content of the title is explained with the processing flow and the image of the UI. The following two videos are embedded.
  1. Demo of Kubecost + Spinnaker integration in action
  2. Spinnaker Workshop: Cost Optimization with Kubecost’s founders

Announcing HAProxy Kubernetes Ingress Controller 1.7

Ivan Matmati & Zlatko Bratkovic, HAProxy

  • The changes in line with the release of version 1.7 of HAProxy Kubernetes Ingress Controller are introduced in detail in the following points.
    ○ Custom Resource Definitions
    ○ CRD Examples
    ○ Distribution of connections to services/pods
    ○ New ALNP option
    ○ Implementation specific path type in ingress rules
    ○ Multiarch Support
    ○ s6 Init system
    ○ Nightly builds
    ○ External mode
    ○ Contributions
    ○ Conclusion

Connecting services to Kubernetes clusters with inlets, VPC Peering and direct uplinks

Alex Ellis, OpenFaaS Ltd.

  • It explains how to connect services to Kubernetes clusters using Inlets, VPC Peering, and direct uplinks.

Transitioning from Monolith to Microservices

Michael Bogan, Dev Spotlight

  • I think it is very good to have a configuration that introduces the transition to microservices as the title suggests, after mentioning the following points in “You might not need microservices architecture if …” at the beginning.
    ○ You’re not having trouble scaling.
    ○ Your monolithic architecture is already flexible enough to meet market demands.
    ○ You’re not having issues with deploying your application.

Securing a Kubernetes pod with Regula and Open Policy Agent

Becki Lee, Fugue

  • It shows you how to run Regula in the Kubernetes manifest to detect unsafe pods , and then explain how to protect them.

Structure testing for Docker containers

Tomas Fernandez, Semaphore CI

  • As a way to test Docker containers before deploying, Google introduces the open source container test tool “Container Structure Tests”.

Kustomize tutorial: Creating a Kubernetes app out of multiple pieces

Nick Chase, Mirantis

  • The content of the title is explained in the following items.
    ○ What is Kustomize?
    ○ Benefits of Using Kustomize
    ○ Installing Kustomize
    ○ Combining Specs
    ○ Managing Multiple Directories
    ○ Changing Parameters for a Component Using Kustomize Overlays
    ○ Creating a Kustomize Patch
    ○ Using Kubectl with Kustomize
    ○ Example: Kustomize Secret Generator
    ○ Conclusion

Kube-fledged: Cache container images in Kubernetes

Senthil Raja Chermapandian, Ericcson

  • It explains how to use the open source project “kube-fledged” to build and manage a cache of container images in a Kubernetes cluster.

Kubernetes logging in production

Kentaro Wakayama

  • The content of the title is explained in the following structure. The points are very well organized and the understanding progresses.
    ○ Logging Architectures
    ○ Logging Patterns
    ○ Pros and Cons
    ○ Putting Theory into Practice
    ○ Conclusion

How to develop a customer provider in Terraform

Saravanan Gnanaguru, InfraCloud Technologies

  • This article is intended for Terraform users who have a basic knowledge of Terraform and how to use it and are likely to develop custom Terraform providers.

Database security best practices on Kubernetes

Johnathan S. Katz, Crunchy Data

  • The content of the title is explained in the following items.
    ○ Run as an Unprivileged User
    ○ Encrypt your Data
    ○ Credential Management
    ○ Keep Database Software Up-to-Date
    ○ Follow Configuration Best Practices
    ○ Limit Where You Can Write
    ○ Securing The “Weakest Link”
    ○ Conclusion

How Linkerd retries HTTP requests with bodies

Eliza Weisman, Linkerd

  • It describes how Linkerd proxies reduce copy and allocation to minimize request body buffering performance overhead, how proxies can determine which requests can be retried, and some edge cases to consider.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Kubernetes co-founder Joe Beda interview

euro interview

Kubernetes cost management and analysis guide

Kasper Siig, CloudForecast

  • It examines the main reasons why it’s so difficult to manage costs with Kubernetes. And as a way to significantly improve cost management, it shows you how to use the AWS Pricing Calculator to estimate the costs associated with running a workload on a custom Kubernetes cluster compared to running an EKS cluster.

I attended Kubecon 2021 in-person, here are my top six takeaways

Amanda Mitchell, Chronosphere

  • The author who participated in KubeCon + CloudNativeCon NA 2021 explains the following 6 takeaways.
  • 1) A green light for more (safe) in-person events
  • 2) Quantity isn’t everything
  • 3) KubeCon 21 felt like old times (aka two years ago)
  • 4) Love notes and theCube
  • 5) Observability and other key themes
  • 6) Inclusivity themes abound at KubeCon 21

KaaS, KPaaS & CaaS: Explained and compared

Lars Larsson, Elastisys

  • It Compares managed services for modern and containerized applications.

Announcing Vitess 12

Alkin Tezuysal, Vitess

  • An overview article with the release of Vitess 12. See the release notes for details.

Upcoming CNCF Online Programs

Please note that no Online Programs are scheduled for this upcoming week. Check out our full playlist of content on the button below!

Visit our Online Programs playlist on YouTube.

Learn more about CNCF Online Programs

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.