SRE / DevOps / Kubernetes Weekly Collection#62(Week 14, 2021)

Yoshiki Fujiwara
10 min readApr 12, 2021

--

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #536 April 4th, 2021
SRE Weekly Issue #264 April 4th, 2021
KubeWeekly #258 April 9th, 2021

DEVOPS WEEKLY ISSUE #536 April 4th, 2021

News

A pitch for a Distributed Operating System Interface (DOSi) and that operating systems should be reimagined to support higher level workloads that are managed in a distributed environment.

  • The title is “The Distributed Operating System Void”.
  • It defines and describes a distinct interface DOSi (Distributed Operating System Interface) between kubernetesland and userland that complements existing interfaces (CNI, CRI, CSI, OCI).

How do you fulfill the promise of continuous deployment? A presentation on the importance of high performance teams and how to build and measure progress.

  • The title is “It is time to fulfill the promise of CI/CD”.
  • The message and points are clear and good.
  • I want the skill to create such materials that are visually easy to understand and do not feel difficult.

A nice explanation of how to trigger a GitHub Action from a webhook, using the repository dispatch configuration and API.

  • The title is “GitHub Actions Trigger Via Webhooks”.
  • It explains how to build a webhook that manually triggers a GitHub Action workflow.

A post on what you should be logging, at least from a security point of view. A good overview of the Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) taxonomy from MITRE.

  • The title is “What exactly should we be logging?”.
  • An article that draws out and summarizes the wisdom of the author as a security architect and technical leader during Q & A on logs. You will have the tools and knowledge to ask the right questions about the system.

A look at how one security team started building security into the development process by integrating various code scanning tools.

  • The title is “Software Security at Rocketship Pace”.
  • It outlines the approach they took when designing the code scanning platform “Intersect” and the lessons learned in the process.
  • In the areas of SAST (Static Application Security Testing) and SCA(Software Configuration Analysis), there was no single tool on the market to meet all of their needs. To achieve the required coverage, they use multiple tools and build an orchestration layer so that all the tools work together.

How to use a multilayer cache to improve cache hit rate on long tail content.

  • The title is “CDN for long-tail content? Fight the cache miss with multilayer caching!”.
  • The content of the title is briefly explained with the following three points.
    ○ Ideal content structure
    ○ Long-tail content
    ○ Multilayer cache

What’s the argument for adopting a service mesh? This post explores the question, and some of the advantages and challenges.

  • The title is “When Is Service Mesh Worth It?”.
  • The following three points explain what I learned from Tetrate’s founding engineer and one of Google’s original Istio builders, Zach Butcher.
    ○ Service Mesh For The Rest of Us
    ○ Usability Improvements to Ease Adoption
    ○ When Service Mesh is Worth It

What’s New in Salt 3003 Aluminium: Beacons, Cloud, Development, Salt Extensions, Performance and caching, Juniper minion, FIPS mode and more.

  • The title is “What’s New in Salt 3003 Aluminum Release”.
  • As the title suggests, it summarizes and explains the Salt Aluminum release.

Tools

The first sigstore tool I’ve come across. Cosign allows for signing a container image and storing the signature in the registry, and finding and verifying signatures for a container image.

  • The GitHub page of “cosign”, a tool for signing, validating, and saving containers in the OCI registry. I will skip it because it was taken up in KubeWeekly#255.

SRE Weekly Issue #264 April 4th, 2021

Articles

Balancing act: the current limits of AWS network load balancers

This well-researched article caught me by surprise. It’s shocking that Ably received advice from AWS to stay under 400,000 simultaneous connections, despite Amazon’s own documentation stating support for “millions of connections per second”.

Paddy Byers — Ably

  • The contents related to the title are explained in the following four items.
    ○ The ask: practically infinite scalability
    ○ The application: millions of real time subscriptions
    ○ Limit 1: maximum target group size
    ○ Limit 2: Connection stability

A Journey Into SRE

This blog is about how a group of hard-working individuals, with unique skills and working methods, managed to create a successful SRE team.

There’s a lot of detail about what their SREs do and how they communicate, with 3 projects as case studies.

Sergio Galvan — Algolia

  • The following section describes how Algolia’s group of hard-working individuals with their own skills and working styles created a good SRE team.
    ○ What SREs do at Algolia
    ○ How we work as a team
    ○ Pairing creates a team
    ○ Three Projects
    ○ The journey continues ..

March 2 incident update

This is an incident followup from an incident at Deno earlier this year. Their CDN saw their heavy use of .ts files (TypeScript, a JavaScript variant) and mistakenly assumed they were MPEG transport segments, a violation of the CDN’s ToS. Oops.

Luca Casonato — Deno

  • As mentioned above, a follow-up article on Deno’s disability. They have confirmed with Cloudflare, which is used as a CDN as follows.
    ○ Cloudflare has assured us this issue will not occur again, and that they will implement changes in their systems to make sure this will not happen to any other Cloudflare customers.

Kubernetes Supports Nine Pillars of SRE

Wait, there are 9 now?

Marc Hornbeek — Container Journal

  • As the title suggests, the following nine pillars are explained.
  1. Leadership and Culture
  2. Work Sharing
  3. Measurement
  4. SLOs and SLIs, Error Budgets
  5. Toil Reduction
  6. Deployments
  7. Performance Management
  8. Incident Management
  9. Anti-Fragility

Frequently Asked Questions on Deviations

There’s a nice little discussion of why “human error” is not a good enough answer for why a deviation (from standard operating procedure) happened.

Susan J. Schniepp and Steven J. Lynn — Pharmaceutical Technolog

  • As mentioned above, it is explained in FAQ format. Qs are as follows.
    ○ What is a deviation and do all deviations need to be investigated?
    ○ What is a planned deviation?
    ○ What’s the best process for investigating deviations?
    ○ Why is human error not an acceptable finding for deviations?
    ○ How much time should I allow for a deviation to be investigated?
    ○ Are out-of-specification (OOS) results considered deviations?

How To Get Fooled By Metrics

They deployed an optimization that skipped sending some requests to the backend… and the backend metrics got worse. Why? Hint: aggregate metrics.

Dominik Sandjaja — Trivago

  • As mentioned above, the metric behaved unexpectedly, so they investigated it, found the cause, and confirmed the improvement of the system as a result.

Outages

KubeWeekly # 258 April 9th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

CNCF joins Google Summer of Code 2021 — Calling all student applications by April 13!

We are excited to announce that Cloud Native Computing Foundation is participating in GSoC 2021, one of the most popular programs for new contributors in the world of open source development.

For those who are not familiar, GSoC is a global program focused on introducing student developers to the world of open source software development. Through the program, students work with participating open source organizations like CNCF on a 10-week programming project during their break from school. Read the blog post to learn more.

  • CNCF joins Google Summer of Code 2021 (GSoC 2021). Click here to register. Until April 13, 2021 14:00.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

What is continuous improvement?

Pini Reznik, Container Solutions

  • The speaker answers the following two questions, and it is aiming to make the listener be able to understand “why the transformation to cloud native fails so often”, and choose the winning strategy to succeed in adopting effective technology and transforming the organization.
  1. Why did you need to change in the first place?
  2. What is wrong with your traditional approach to building software?
  • I thought that “Cloud Native is more than Tech” was obvious from the definition of CNCF, but it is important to materialize the elements.

A Deep Dive into Kubestr — A new way to explore your Kubernetes options

Michael Cade & Sirish Bathina, Kasten by Veeam

  • I will skip it because it was covered in this “ICYMI: CNCF online programs this week” last week.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Windows containers on Windows 10 without Docker (using Containerd)

James Sturtevant

  • It has worked on getting containers to work properly on Windows using Kubernetes. It had to do local development with containerd, so it configured the local machine, but it couldn’t find any comprehensive documentation, so it wrote down its steps and shared them.

Oxidizing the Kubernetes operator

Pavel Pscheidl

  • At the beginning, Kubernetes Operator and Rust are explained, and the environment construction procedure with the combination is explained in the following items.
    ○ Implementing an operator
    ○ Project setup
    ○ Creating a CustomResourceDefinition
    ○ How Kubernetes and Operator work together
    ○ Creating a custom Controller
    ○ Implementing the operator logic
    ○ Finalizers
    ○ Creating and deleting deployments
    ○ Running the operator
    ○ Additional resources

Site Reliability Engineering (SRE) best practices

Rayan Das, Infracloud

  • As the title suggests, the following seven SRE best practices are explained.
  1. Error Budgets
  2. Define SLOs Like a User
  3. Monitoring Errors and Availability
  4. Efficiently Planning Capacity
  5. Paying Attention to Change Management
  6. Blameless Postmortem
  7. Toil Management

Essential tips to manage your gRPC services with Kong like a pro

Guilherme Salazar, Kong

  • A step-by-step tutorial on how to set up Kong to proxy gRPC services. Explains two possible scenarios.

Implementing zero downtime deployments on Kubernetes — the plan

Matthew Flatt

  • As the title suggests, it considers a plan to carry out multiple deployments on Kubernetes with no downtime. The contrast between “Kubernetes rolling updates”, “Blue/Green deployments” and “Rainbow deployments” was easy to understand.

Bringing your VMs to Kubernetes with KubeVirt

Irina Lindt, Kubermatic

  • An article that introduces the open source project “KubeVirt.io” that can manage VM workloads with Kubernetes and explains how to use it.
  • In the next article, it will explain how to use KubeVirt on the Kubermatic Kubernetes Platform.

A new era of Kubernetes integrations is GitLab.com

Viktor Nagy, GitLab

  • It introduces GitLab Kubernetes Agent. It provides a secure connection between your GitLab instance and your Kubernetes cluster, allowing pull-based deployments to receive alerts based on network policies.

The distributed operating system void

Kris Nova, Twilio

  • I will skip it, because it is taken up in DEVOPS WEEKLY ISSUE #536 above.

Generating Kubernetes network policies by sniffing network traffic

Murat Celep, VMware

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Kubernetes 1.21: Power to the community

Kubernetes Release Team

Weaveworks (part 2), with Alexis Richardson

Craig Box, Kubernetes Podcast from Google

Why I run Django on Kubernetes as a one-man SaaS

Anthony Simon

  • The author explains the content of the title in the following items.
    ○ An elephant in the room
    ○ There’s no holy grail
    ○ Why I use Kubernetes
    ○ Why I use Django
    ○ Standing on the shoulders of giants
    ○ What to make of this

DevOps and Kubernetes: a perfect match?

Gilad David Mayaan, Container Journal

  • The following sentence is questionable. Perfectly suited to help transition infrastructure to public clouds?
    ○ Kubernetes is perfectly suited to help transition infrastructure to public clouds like Azure or AWS.
  • I agree with the following.
    ○ In short, DevOps and Kubernetes are not a perfect match, but Kubernetes can certainly be a powerful tool when properly configured. Just make sure you are not getting in too deep, and understand that K8s is not an all-encompassing solution.

PODCAST: How to manage a successful CNCF project with William Morgan of Linkerd

Justin Dorfman & Richard Littauer

  • It has Buoyant CEO William Morgan as a guest to talk about his career from Twitter to Linkerd and his focus on Linkerd.

Kubernetes jobs market (Q1 2021)

Derek Newman

  • It explains the expected value when looking for a job of Kubernetes numerically.
  • Please note the following when looking at the numbers from the author. The job descriptions that we collected are slightly skewed:
  1. At Kube careers we only focus on Kubernetes jobs.
  2. If a job doesn’t have a clear salary range we discard it. Many job offers don’t indicate a salary range and we think this is not good for engineers looking for work.
  3. We discarded job offers from recruitment agencies.
  4. We analysed listings on platforms used by European and American audiences.
  5. The dataset is small — only 86 job descriptions from January, February and March 2021.

Upcoming CNCF Online Programs

Cloud Native Live

  • 4/14/21: Enforce configuration and security checks for your YAML Files and Helm Charts with KubeLinter, by Viswajith Venugopal, StackRox — RSVP

On-demand

  • 4/15/21: What’s new in Argo Workflows 3.0, by Alex Collins, Intuit — RSVP

YouTube playlist submissions

Learn more about CNCF Online Programs

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.