SRE / DevOps / Kubernetes Weekly Collection#92(Week 44, 2021)

Yoshiki Fujiwara
10 min readNov 7, 2021
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #566 October 31st, 2021
SRE Weekly Issue #294 October 31st, 2021
KubeWeekly # 282 November 5th, 2021

DEVOPS WEEKLY ISSUE #566 October 31st, 2021

News

Some thoughts about the future of scripting languages, interesting given this is where much of the code operations teams write resides.

  • The title is “Scripting languages ​​of the future”.
  • As the title suggests, the future of scripting languages ​​is thought of in the following points.
    ○ Scalability
    ○ Tune-ability
    ○ Easy parallelism
    ○ IDE support
    ○ Looking ahead

An interview with a good discussion about service ownership, service maturity, and service level indicators.

  • The title is “Seth Lochen of Groupon talks ownership and the bystander effect, platform engineering, and frogs in boiling water”.
  • As the above Editor commented, the following points explain service ownership, service maturity, SLI, etc.
    ○ Service ownership: Who’s really responsible?
    ○ Caching–or why service maturity tasks matter
    ○ Translating monitoring metrics into business KPIs
    ○ When to start a dedicated platform engineering team

Stored procedures got a bad name for spreading business logic between application and database, and not all cloud databases support them. But they have some powerful use cases, as this post argues.

  • The title is “Are Stored Procedures and Triggers Anti-Patterns in the Cloud Native World?’.
  • It provides practical examples where features such as triggers, stored procedures, expression indexes, and asynchronous replication are important when building cloud-native applications.

A good post on rapidly scaling a data team. Observations on team structure, rapidly onboarding new team members, technology choices and more.

  • The title is “A Behind-the-Scenes Look at How Postman’s Data Team Works“.
  • It details behind-the-scenes views of the rapidly expanding Postman data team, the structure, who they hire for different roles, how they plan and prioritize their work democratically, and how they use sprints to constantly identify problems and make improvements.

Patterns for Kubernetes logging, looking at the pros and cons of node-level agents and sidecar approaches.

  • The title is “Kubernetes Logging in Production”.
  • Since it was covered in KubeWeekly #281 last week, so I will skip it.

A quick look at open source tools and standards in the observability space.

  • The title is “Open Source for Better Observability”.
  • Observability is typically built on three pillars — Metrics, Logs, and Traces. It describes how they tell us the “what”, “why” and “where”, and how they enable us to answer questions about their system.

A story of one systems administrator moving from maintaining legacy middleware applications to becoming an active part of the cloud native open source community. Good to share with anyone wondering how to advance their career.

  • The title is “From zero to WIP”.
  • By sharing the basic tenets on which its decisions were made, for people stuck in a similar position with zero previous experience in cloud native, it hopes, the journey becomes a little less daunting.
  1. Introspect
  2. Assess
  3. Chart
  4. Course correction
  5. So, what’s next?

Comprehensive release notes, observations and opinions for the latest Salt release.

  • The title is “New features in Salt 3004 Silicon”.
  • Salt 3004 Silicon did not follow the usual 4-month release cycle and was released 7 months after the previous major version. “Click here for official announcement”.
  • Below is an excerpt of the new features.
  • New features in Salt 3004 Silicon: Pluggable transports, DeltaProxy, Loader refactoring, Vault Enterprise, VMware extensions, Transactional systems, Salt SSH, Memory leaks mitigations

Tools

Ottr is a serverless Public Key Infrastructure framework that handles end-to-end certificate rotations without the use of an agent.

APIClarity allows for reconstructing OpenAPI specifications from real-time workload traffic, using traffic information from a service mesh like Istio.

  • The Web page of the open source cloud native visualization tool “API Clarity”. Leverage a service mesh framework to capture and analyze API traffic to identify potential risks.
  • Click here for the GitHub page.

SRE Weekly Issue #294 October 31st, 2021

Articles

Five Steps To Reduce SRE Toil And Add More Value

The steps are:

* Know How Much Time Is Spent On Toil
* Find The Toil
* Determine The Root Causes Of Toil
* Find And Prioritize The Low-Hanging Fruit
* Promote Toil Reduction

Aater Suleman — Forbes

  • It explains the title and the five steps in the Editor excerpt above. According to the recent Catchpoint 2021 SRE Report, few organizations have effectively measured how SRE time is spent. Indeed, only 22% of surveyed organizations report measuring toil in any systematic way — even though toil is the leading activity that prevents SREs from focusing on activities that add business value.

How we’re building a production readiness review process at Grafana Labs

I like how they try to strike a balance and avoid reviewing too far in depth, while still hitting everything important.

Milan Plžík — Grafana Labs

Seth Lochen of Groupon talks ownership and the bystander effect, platform engineering, and frogs in boiling water

Lots of good stuff in this one about one of my favorite topics, service ownership.

Kenneth Rose — OpsLevel

  • Since it is covered in DEVOPS WEEKLY ISSUE # 566 above, I will skip it.

How do CRDTs solve distributed data consistency challenges?

This is the intro I needed to understand Conflict-Free Replicated Data Types.

Jo Stichbury — Ably

  • An article about the complexity of maintaining data consistency in distributed environments. It answers the following questions that come up in the process of introducing conflict-free replicated data types (CRDT) as a way to resolve concurrent data change.
    ○ What is strong consistency? What is eventual consistency?
    ○ What is replication conflict?
    ○ How does Google Docs resolve conflicts?
    ○ What are operational transformations (OTs)?
    ○ What are conflict-free replicated data types (CRDTs)?
    ○ Why use CRDTs?
    ○ Do CRDTs simplify software design?
    ○ What are the disadvantages of CRDTs?
    ○ What are the use cases for CRDTs?
    ○ Where can I find out more about CRDTs?

Defining Availability, Maintainability and Reliability in SRE

Availability, maintainability and reliability all have distinct — if related — meanings, and they each play different roles in reliability operations.

JJ Tang — DevOps.com

  • As the title suggests, it explains the definitions of availability, maintainability, and reliability, and the differences.

Five-P factors for root cause analysis

The five Ps come from medicine and understanding medical accidents, but they apply equally well to analyzing incidents in IT.

Lydia Leong

  • The following 6 “P” s, 1 Problem and 5 P factors used for root cause analysis are explained.
    ○ The presenting problem
    ○ The precipitating factors
    ○ The perpetuating factors
    ○ The predisposing factors
    ○ The protective factors
    ○ The present factors

Incident Review and Postmortem Best Practices

I really love the focus on de-emphasizing finding action items in incident retrospectives, in favor of learning.

Gergely Orosz — The Pragmatic Engineer

  • As the title suggests, it explains the following questions.
    ○ Common incident handling practices across the industry. What are the trends on how tech companies approach incidents today?
    ○ Incident review best practices. What are processes, tools and approaches that we can point to as sensible practices?
    ○ Incident review practices of tomorrow. A few teams and companies have moved beyond what we’d call the best practices of today. What is their approach and how is it working?
    ○ What tech can learn from incident handling in other industries. Incidents are not unique to tech; fields like healthcare, the military and many others have a long history of efficiently dealing with incidents. What can we learn from them?
    ○ Incident review/postmortem examples and templates. A selection of case studies you can get inspiration from and 🔒subscriber-only postmortem templates.

Outages

  • AT&T SMS in the US
    This week, I saw several status pages point to some kind of problem in their ability to send SMS notifications to AT&T phones. I thought this was interesting because usually I don’t learn about an outage solely from other companies’ status pages.
  • Google Meet
  • Tesco
  • Coinbase
  • Zomato
  • Barclays
  • HSBC

KubeWeekly #282 November 5th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Knative 1.0 released!

Congratulations to Knative on their 1.0 release! This important milestone is made possible by the contributions and collaboration of over 600 developers. The Knative project was released by Google in July 2018, and it was developed in close partnership with VMWare, IBM, Red Hat, and SAP. Over the last 3 years, Knative has become the most widely-installed serverless layer on Kubernetes.

  • Knative 1.0 release article. It consists of the following items.
    ○ What’s new
    ○ What does it mean to be 1.0?
    ○ Learn more
    ○ Get involved
    ○ Thank you to our contributors!

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Notary v2 — Promoting signed artifacts

Steve Lasker, Microsoft

  • A 52-minute session introducing Notary v2.
  • Notary v2 enables the signing of all artifacts placed in an OCI-compliant registry to ensure that the container image you deploy was built by a vendor or a trusted team.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

kubepug: Kubernetes PreUpGrade (Checker)

Ricardo Katz, VMware and other contributors

  • The GitHub page of the kubectl plugin “KubePug / Deprecations” that has the following functions.
    ○ Downloads a swagger.json from a specific Kubernetes version
    ○ Parses this Json finding deprecation notices
    ○ Verifies the current kubernetes cluster or input files checking whether exists objects in this deprecated API Versions, allowing the user to check before migrating

konfig: Helps to merge, split or import kubeconfig files

Cornelius Weig, Google and other contributors

  • As mentioned above, the GitHub page of “konfig” that helps you merge, split and import kubeconfig files.

November 2021 update

Flux

  • November update for Flux. As a recapping for October, it is explained in the following items.
    ○ News in the Flux family
    ○ Recent & Upcoming Events
    ○ In other news
    ○ Over and out

Why write a new planner

Andres Taylor, Vitess

  • It describes the history of Vitess’ V3 query planner, why it created a new query planner, and the development of a new Gen4 query planner.

Keeping Kubernetes clusters clean and tidy

Martin Heinz

  • It introduces common/basic issues that can be caused by forgotten resources such as pods, unused persistent volumes, completed jobs, or old ConfigMap/Secret, and describes the options for cleaning up your Kubernetes cluster.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Bottlerocket, A year in the life

Jesse Butler, AWS

  • It explains the features of Bottlerocket in line with the recent launch of Bottlerocket support for Amazon EKS managed node groups.

What the arrival of IPv6 support in Kubernetes means for you

Andy Holtzmann, Equinix Metal

  • As the title suggests, Kubernetes 1.23 supports dual stacks and it describes what IPv6 brings.

Google employee startup Chainguard focuses on open-source and supply chain security

Alice Gillin, SiliconANGLE

  • In early October 2021, a group of five former Google employees started Chainguard Inc., a startup focused on open source supply chain security. The company’s co-founder talks about the security risks of open source software, DevOps security changes, and the future of Chainguard. Approximately 12 minutes of interview video of KubeCon + CloudNativeCon NA 2021 is embedded.

16 CNCF interns graduated from Google Summer of Code (GSoC) 2021!

CNCF

  • As the title suggests, 16 interns from Google Summer of Code (GSoC) 2021 have completed the program. It introduces the participating projects, achievements, mentors, etc. of each mentee.

Dapr (Distributed Application Runtime) joins CNCF Incubator

CNCF

Kubernetes: what are the key benefits for companies?

SparkFabrik

  • The content of the title is explained focusing on the following “The 5 key advantages of Kubernetes”.
  1. REDUCING DEVELOPMENT AND RELEASE TIMEFRAMES
  2. OPTIMIZING IT COSTS
  3. INCREASED SOFTWARE SCALABILITY AND AVAILABILITY
  4. FLEXIBILITY IN MULTI-CLOUD ENVIRONMENTS
  5. CLOUD MIGRATION PATHS

Microservices and cloud native applications vs. monolithic applications

SparkFabrik

  • It analyzes the benefits of adopting the new development model in the title and explains when it pays off to make the switch.

Observability trends 2021

Saiyam Pathak

  • As the title suggests, it introduces observable trends and the following tools.
    SigNoz
    Opstrace
    Open

Maintaining OpenFaaS with CNCF Ambassador Alex Ellis

Curiefense podcast

  • A 41-minute podcast featuring Alex Ellis, the founder of OpenFaaS, one of the open source serverless projects, as a guest and talking about how OpenFaaS was founded.

Solve these common Kubernetes challenges early

Margherita Andreassi, Kong

  • It explains the following points to be aware of when migrating to Kubernetes.
    ○ Container Sprawl
    ○ Gaps in Kubernetes Visibility Options
    ○ Complexities of Kubernetes’ Non-Native API Management
    ○ Difficulty of Building Security

Upcoming CNCF Online Programs

Live Webinar

Cloud Native Live

YouTube playlist submissions

Learn more about CNCF Online Programs

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.