SRE / DevOps / Kubernetes Weekly Collection#51(Week 3, 2021)

11 min readJan 28, 2021

In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #525 January 17th, 2021
SRE Weekly Issue #253 January 17th, 2021
KubeWeekly # 247 January 22nd, 2021 ← It seems that the web page has not been uploaded yet (as of 15:00 on January 23, 2020)

DEVOPS WEEKLY ISSUE #525 January 17th, 2021

News

A good argument for service mesh disappearing out of sight, making the point that service mesh is the dynamic linker for cloud based environments.

The title is “Why The Service Mesh Should Fade Out Of Sight”.
Before reading the text, I thought that I misunderstood the intention of the title and questioned the existence value of the service mesh, but it was different. He questioned the current state of the service mesh. It is easy to understand if you look at the following summary part with the title.
○ In sum, the service mesh should be a platform feature, not a product category — as far out of sight and mind from the DevOps team as possible.

A good checklist of things to do to protect your GitHub projects. Supply chain attacks are increasingly in the news.

The title is “Securing Your GitHub Project”.
The author, who has been thinking and talking about open source security lately, became convinced that the more conversations he had, the more complex the topic, and came up with the following ideas:
○ Making good security practices the path of least resistance is a solid way to raise the bar in this space.
Here’s a 15-item checklist to help you get started without knowing how to protect your open source projects.

Use a credential manager to protect your access credentials
Configure two-factor authentication (2FA)
Enforce signed commits
Protect the release branch
Require pull request reviews and approvals
Scan source code for sensitive data leaks
Scrub leaked secrets from git history
Only use trusted GitHub Actions
Protect the secrets used by GitHub Actions
Review project dependencies for vulnerabilities
Patch dependencies with vulnerabilities
Scan project source code for vulnerabilities
Publish a security policy
Collaborate on fixes for security vulnerabilities in private forks
Publish maintainer advisories for security fixes

A set of posts on best practices for creating container images for your .NET applications, including configuration and connecting to a database.

As the Editor describes, a set of two articles. The title of the above link is “A container journey: .NET 5 web app dockerization”.
The title of the second article is “ The journey continues: Containerized .NET5 web app on Docker connects to database-container “.

A few posts on less-well-known capabilities of the Kubernetes role-based-access system, looking closely at bind and escalate.

The Editor picks up two articles from the same author on the theme of RBAC’s “Bind” and “Escalate”. The title of the above link is “Escalating Away”.
The title of the second article is “ Getting into a bind with Kubernetes “

An interesting walkthrough of the test suite of a reasonably complex project, discussing tradeoffs, configuration and the importance of optimising CI.

The title is “Improving Testing & Continuous Integration in Phoenix”.
It describes how to approach the testing and CI of the “Phoenix” project and how recent changes have made this process much smoother.

Most internal development teams have documentation for new starters to get set up with all of the needed software. It’s an interesting insight into a team’s stack. But it’s interesting to see this set of documentation posted publicly for others to explore.

The title is “Deploying Software at GoCardless: Open-Sourcing our “Getting Started” Tutorial”.
As the Editor describes, it introduces that the document and framework “Utopia” for new starters in the company was published on GitHub.
Click here for GitHub page.

A good post for anyone needing to learn Gradle, or interested in building understandable software.

The title is “The Problem with Gradle”.
We share the following issues to reduce frustration when learning Gradle.

You’re not Configuring, You’re Programming
Groovy is Not Java
Gradle Uses a Domain-Specific Language
There are Many Ways to do the Same Thing
Magic

A comprehensive guide to vertical pod autoscaling in Kubernetes.

The title is “VERTICAL POD AUTOSCALING: THE DEFINITIVE GUIDE”.
I will skip it because it was covered in KubeWeekly # 246 last week .

A big list of patterns for working with environment variables on the shell.

The title is “How to Set Environment Variables in Linux and Mac: The Missing Manual”.
The contents are as described in the title and the above Editor’s comment. I want to get my hands dirty, so bookmark it.
At the end of the article, it introduces the following resources to advance learning as the next level.
○ Bite Size Bash by Julia Evans (not free but totally worth it)
○ Shellcheck static analysis tool
○ Google shell style guide

Tools

driftctl tracks how well your Terraform/AWS codebase covers your cloud configuration and warns you about drift.

As the Editor describes above, the GitHub page of the OSS tool “driftctl” that tracks how well the IaC codebase covers your cloud configuration and warns you of drifts.
The features are as follows.
○ Scan cloud provider and map resources with IaC code
○ Analyze diff, and warn about drift and unwanted unmanaged resources
○ Allow users to ignore resources
○ Multiple output formats
Click here for the web page.

Please is a cross-language build system with an emphasis on high performance, extensibility and reproducibility. It supports a number of popular languages and can automate nearly any aspect of your build process.

As the Editor describes above, the web page for “Please”, a cross-language build system that focuses on high performance, scalability, and accuracy.
Click here for the GitHub page .

SRE Weekly Issue #253 January 17th, 2021

Articles

May 30 SSL incident

TLS can be such a headache.

This was an interesting situation. There was a valid path to the USERTrust RSA Certification Authority, and there was also an expired path. The browser was able to find the valid chain, but the curl was not able to find it.

Adam Surak — Algolia

A retrospective article on SSL-related failures on 2020/05/30 dated 2020/06/02.
At first glance, the part covered by the above Editor was incomprehensible and interesting.

Shifting Modes: Creating a Program to Support Sustained Resilience

A well-researched article on shifting emphasis from incident prevention to learning and resilience.

Incidents cannot be prevented, because incidents are the inevitable result of success.

Alex Elman

To find out what resilience means to researchers and organizations: 1. Refer to the resilience literature, 2. Industry-wide engineers and engineering managers use their own organization’s case studies. Through the lessons learned by the study subjects are shared.
“Key Takeaways” other than those covered by the above Editor’s comment are as follows.
○ Organizations must shift from a “prevent and fix” safety mode to a “learn and adapt” (Learn & Adapt) safety mode to manage reliability and resilience. This shift helps to more effectively cope with increasing complexity and scale.
○ Finding advocates to help socialize the movement and communicating broadly are key aspects of creating a sustained shift to a Learn & Adapt mode.
○ Normalizing behaviors — such as stating assumptions, asking more questions, increasing cooperation between diverse roles, and broadly sharing incident write-ups across the organization — help with the mode shift by increasing the flow of information.
○ Developing the cultural traits of opportunity creation, flexibility, agility, and trust are necessary for an organization poised to shift to Learn & Adapt.

Error budgets and the legacy of Herbert Heinrich

This one’s worth reading through twice to let it sink in. It puts me in mind of this article by WIll Gallego, which is another thoughtful critique of error budgets.

Here are the claims I’m going to make:

1. Large incidents are much more costly to organizations than small ones, so we should work to reduce the risk of large incidents.

2. Error budgets don’t help reduce risk of large incidents.

Lorin Hochstein

It raises the questions faced by software developers and explains the problems with using the “error budget” of the SRE world as a way to approach these questions.
○ How can we best use our knowledge about the past behavior of our system to figure out where we should be investing our time?
The author is skeptical of quantitative, metric-based approaches like error budgets and prefer qualitative approaches that leverage the experimental judgment of engineers.

97 things every SRE should know — Part 01

This is a review of a few of the chapters of the book of the same title by Emil Stolarsky and Jaime Woo.

Have you read it too? I’d love to read your take on it!

Dean Wilson

The author, who read the O’Reilly book “97 Things Every SRE Should Know” published it as some reading notes for himself in the future. It is described separately for each chapter.

Understanding Incidents: Three Analytical Traps

This one’s worth reading the next time need to do an incident retrospective. The traps are:

1. Counterfactual reasoning

2. Normative language

3. Mechanistic reasoning

John Allspaw — Adaptive Capacity Labs

It has transcribed a 7-minute video explaining three common analytical traps that incident analysts and accident investigators can fall into, with the parts he wants to emphasize in red and bold.

Counterfactual reasoning
Normative language
Mechanistic reasoning

This Is the Most Underappreciated Skill for SREs

The skill in question is glue work, and I sure appreciate a good gluer when I see one.

Emily Arnott — Blameless

Along the title, it provides examples of the following glue work performed by SREs, which are essential tasks for a successful project, even if they do not contribute to the code base.
○ SREs align stakeholders’ goals with common language
○ SREs bring people together in inspiring ways
○ SREs grow an empathetic, trusting culture

Building and Scaling Your SRE Team

This one starts out by defining SRE, then goes into how to define your team and fill it with people.

Julie Gunderson — PagerDuty

Tammy Bryant, Gremlin’s Principal SRE, shared in Page it to the Limit podcast to define the role of SREs based on some best practices, as well as to build and extend SRE teams. The practical method is explained in detail in the following items.
○ What is an SRE?
○ SRE Skills & Responsibilities
○ Establishing an SRE Team
○ Scaling Your SRE Team

Outages

Fastly
Fastly is my employer.
Slack
Tyro Payments
Signal
.ke TLD (Kenya)
Microsoft Teams, Office 365 and OneDrive
Instagram

KubeWeekly # 247 January 22nd, 2021 ← It seems that the web page has not been uploaded yet (as of 21:00 on January 28, 2020(JST))

The Headlines

Editor’s pick of the highlights from the past week.

The First Six Months: CNCF Observations and 2021 Vision

Priyanka Sharma, CNCF

Priyanka Sharma reflects on her role as the General Manager of CNCF (over the last six months) and shares her philosophy for enabling #teamCloudNative going into 2021.

As mentioned above, CNCF GM (General Manager) Priyanka Sharma talks about six months after taking office as GM and her vision for 2021 through her philosophy. The following phrases that impressed me in her presentation at KubeCon EU. It was a sentence that made me feel motivated to move forward as #teamCloudNative.
CNCF is a foundation of doers

CNCF and the Linux Foundation, with Chris Aniszcyzk

Adam Glick and Craig Box, Kubernetes Podcast from Google

With his unique vantage point of cloud native trends, Chris Aniszczyk shares his technology journey and his predictions for 2021.

Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
The Guest is Chris Aniszcyzk , VP of DevRel of the Linux Foundation, CTO of CNCF, and Executive Director of the Open Container Initiative .
The topics I was interested in in the News of the week are as follows.
Nutanix now supports Anthos
Tanzu Advanced is GA
New CSI driver for Google Kubernetes Engine
Grafana Cloud introduces free tier

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Hoot: Advanced Istio Configuration with Envoy CRDs

Scott Weiss, Solo.io

A 26-minute Webinar video that explains the content of the title by Architect Scott Weiss of Solo.io, focusing on Envoy Filter.

Implement Policy-based Governance Using Configuration Management of Red Hat Advanced Cluster Management for Kubernetes

Jaya Ramanathan and Christian Stark, Red Hat

It outlines best practices for developing declarative policies on aspects of security and compliance, resiliency, and software engineering. All of this can be done without programming.
It describes how to use the Red Hat Advanced Cluster Management built-in configuration policy controller to complete the following actions:
Use best practices to configure Kubernetes resources used to ensure various security aspects such as access control and encryption.
Deploy operators, check if they are operating and are configured properly, as well as receive status results from the operators.

Kubernetes Readiness Probes — Examples & Common Pitfalls

Levent Ogut

Along with the title, the effect of Readiness Probe is confirmed and the parameters that can be configured are explained.

Kubernetes Cost Reporting using Kubecost

Aman Juneja, Infracloud Technologies

It explains in detail about how to use Kubecost on a multi-tenant EKS cluster to improve visibility.
It leaves the following comment as a conclusion, which looks good.
Kubecost covered almost all our requirements but it comes with a slight operational overhead to set it up properly compared to many other paid solutions in the market. But I feel the value it provides is way more than efforts to configure it correctly.
Kubecost support is also very prompt and the team is always up for help. If you are looking for any open source tool to get your Kubernetes cluster cost insights coupled with your cloud provider’s costing details then Kubecost is worth trying.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Cilium, with Thomas Graf

Adam Glick and Craig Box, Kubernetes Podcast from Google

This species features two episodes of Kubernetes podcasts, along with those featured in The Headlines.
The guest is Thomas Graf, Cilium’s inventor and Isovalent’s co-founder .
The topics I was interested in in the News of the week are as follows.
○ KubeCon NA 2020 Transparency Report
○ Crossplane 1.0
○ Vitess project journey report

GitOps-based Policy Management: How to Scale in a Multi-Node, Multicloud World

Anita Buehrle, WeaveWorks

The following sections describe common challenges faced in a multi-cluster environment and how GitOps and effective policy management can make large-scale Kubernetes deployments easy everywhere.
○ Diverse cluster stacks add complexity
○ Manage cluster configuration definitions with GitOps
○ What does Git-based policy look like?
○ Self-service Kubernetes with guardrails
○ Managing fleets of clusters
○ Achieving consistency between local environments and the cloud
○ Streamlining access control across the organization
○ Conclusion

Cloud DevOps With OpenShift and JFrog

Alex Handy, Red Hat and Jeff Fry, JFrog

It explains how to provide software for developers using OpenShift, OpenShift Pipelines, and the JFrog Platform.

# 61 — Containers and Security with Liz Rice (in French)

Electro Monkeys Podcast

It is spoken in French with Liz Rice as a guest under the theme of “Container Security”. I tried my best to listen, but I couldn’t. I want to study French.

Project Agumbe: Share Objects Across Namespaces in Kubernetes

Savithru Lokanath, Salesforce Engineering

It explains how to use the extensibility of Kubernetes to support the following use cases in the title using the custom controller “Agumbe” in Salesforce. Agumbe is named after a small coastal town in Karnataka, India.
○ At Salesforce, we use Kubernetes to orchestrate our services layer and recently ran into a use case where we wanted to apply and manage certain common objects across Kubernetes namespaces.

Upcoming CNCF Online Programs

CNCF Project Webinar: Kubernetes 1.20

Jeremy Rickard, Software Engineer @VMware
Kirsten Garrison, Software Engineer @Red Hat
January 27, 2021 at 11:00 am PT
Register Now

For more information, please visit our updated Online Programs page.

From the link above, the items “Future events”, “Past events”, and “Organizer” were created, and future events were included.
Since a Group has been created in CNCF Online Programs, I have registered as the 8th member.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara