SRE / DevOps / Kubernetes Weekly Collection#58(Week 10, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #532 March 7th, 2021
SRE Weekly Issue #260 March 7th, 2021
KubeWeekly #254 March 12th, 2021← Received e-mail newsletter. Since it has not been uploaded to the web page, it is waiting to be reflected.

DEVOPS WEEKLY ISSUE #532 March 7th, 2021


A solid architecture post on a large scale compute platform that aims to combine the best aspects of microservices with asynchronous workflows and serverless functions. Combines migration, design, future thoughts and more.

  • The title is “The Netflix Cosmos Platform”.
  • It explains why he built the computing platform “Cosmos” developed by Netflix, how it works, and shares what they learned in the process.
  • Its sweet spot is applications that involve resource-intensive algorithms coordinated via complex, hierarchical workflows that last anywhere from minutes to years.
  • Looking at the Q&A of the article, it asked Cosmos to be open source and the answer as follows.
    ○ We like the idea, but haven’t done the work necessary to open source the code.

3 interesting observations about adopting public cloud when coming from traditional infrastructure; the cloud has its own CMDB, the only thing that knows about a thing is itself, and the advantage of cloud is flexibility.

  • The title is “3 Things to know when moving to public cloud”.
  • As the title suggests, it is explained in the following three points.
    ○ Say adiós to your CMDB(configuration management database)
    ○ The only thing that knows about a thing is itself
    ○ Flexibility

A great post on devops and security, from the security engineering perspective. Some good practical tips including embedding security engineers in engineering teams (and vice versa), gamification and more.

  • The title is “Shifting Engineering Right: What security engineers can learn from DevSecOps”.
  • It focuses on how to build meaningful partnerships between security engineers and software engineers, and discusses the following points:
    ○ How to apply some of the lessons of DevOps to security engineering
    ○ How to make your first change in production code
    ○ How to speak the language of product and engineering
    ○ Ensure that security features are prioritized and built correctly
    ○ Steps to take after you’ve cooperatively delivered a feature
    ○ How to show appreciation and recognize those that helped you succeed

Argo introduces some interesting higher-level concepts into Kubernetes, including AppProject which allows for a project container to tie together other resources.

  • The title is “Hassle-free multi-tenant K8S clusters management using Argo CD”.
  • Along with the title, Argo CD is introduced through hands-on use of Minikube as a sample cluster.

Containers are for stateless services, or so the tale goes. In reality lots of people are running databases in containers. This post, focused on PostgreSQL, makes the case that the benefits often outweigh the disadvantages.

  • The title is “Deep PostgreSQL Thoughts: Resistance to Containers is Futile”.
  • An article written as a counterargument to the following opinions that the author recently encountered.
    ○ containers are not ready for prime time as a vehicle for deploying your databases

A couple of handy posts on how to monitor AWS Fargate. What metrics should you be interested in, and how can you best access them?

I like arguments against the zeitgeist, like this post on scaling issues with GitOps. I think there are some assumptions here that, if you are building for scale you can address with automation, or are fundamental to scale rather than the approach, but I like the debate.

  • The title is “The Fundamental Flaws of GitOps — A Statistical Analysis”.
  • It uses statistical analysis to explain how and why GitOps choices become less satisfying as the number of YAML files and microservices grows.

Cloud migration in large, complex organisations is invariably a lot more complicated that it might first appear, partly because it’s as much about people and structures as technology. Some good observations along those lines in this next post.

  • The title is “The Struggle with Cloud Adoption”.
  • As the title suggests, it explains why cloud adoption struggles. The reason why cloud adoption struggles is not linear and not only technical. The explanation is based on the following points.
    ○ Disservice
    ○ It’s all about the Benjamins Baby
    ○ Staffing and Legacy IT
    ○ The Complexity of an Enterprise Data Center
    ○ This is not an “anti-cloud” rant
    ○ Pandemic
    ○ Digital Transformation & Edge Computing

SRE Weekly Issue #260 March 7th, 2021


[Increment: Reliability] Interview: Dr. David D. Woods

People throw around “resiliency” quite often when they mean “reliability” or “high availability”. Dr. Woods sets the record straight.

Ipsita Agarwal — Increment

  • It describes the differences between reliability and resiliency (and dependencies) and how to build complex systems that work under stress and surprise.

[Increment: Reliability] The process: Implementing Yelp’s failover strategy

A key part of their strategy is to keep their service running at 50% capacity or less, allowing them to lose a datacenter without overloading the remaining data center.

Mathieu Frappier, Dorothy Jung, and Qui Nguyen — Increment

  • It tells the story of Yelp’s production engineering and computing infrastructure teams implementing a failover strategy by finding a balance between reliability, performance, and cost effectiveness.

[Increment: Reliability] On adaptive capacity in incident response

In issue #236, I linked to an excellent paper by Dr. Richard Cook and Beth Long about engineering resilience in incident response. Now they’re back, teaming up with John Allspaw to summarize and expand on that paper!

John Allspaw, Beth Adele Long, and Dr. Richard Cook — Increment

  • The Summary and extension of Cook and Long findings. It examines one company’s approach to incident response as a framework for understanding adaptability and explains the key points for organizations looking to adopt a resilience engineering perspective.

Security Chaos Engineering: How to Security Differently

Security Chaos Engineering: How to Security Differently A quick s/security/reliability/g and this is an SRE article; the same principles apply to both fields.

Aaron Rinehart — Verica

  • The above title and the contents of the Editor’s comment are explained with Magin not thinking and etc.

SRE2AUX: How Flight Controllers were the first SREs

SRE2AUX: How Flight Controllers were the first SREs How can we apply the tenets and principles of NASA mission controllers to our SRE work?

Geoff White — Blameless

  • An article that learns lessons from NASA’s past accidents. There were many references, such as one of the causes of the following accidents.
    ○ “reliance of past success as a substitute for sound engineering practices.”

SRE as Organizational Transformation: Lessons from Activist Organizers

SRE as Organizational Transformation: Lessons from Activist Organizers Genius idea: we can take our lead from activists as we try to win over our organization to adopt SRE principles.

Chris Hendrix — Blameless

  • It touches on the impact of agile methods and SRE, and provides a hand-picked list of tips and practices that can be used to enhance a company’s transformation efforts.

Atlas: Our journey from a Python monolith to a managed platform

This insightful observation caught my eye:

It’s unnecessary overhead for a product team to plan capacity, set up good alerts and multihoming (automatically running in multiple data centers) for small, simple functionality.

Naphat Sanguansin and Utsav Shah — Dropbox

  • It explains why and how “Atlas” was developed and deployed, a platform that provides most of the benefits of a service-oriented architecture while minimizing the operational costs normally associated with owning a service.


KubeWeekly # 254 March 12th, 2021 ← Received e-mail newsletter. Since it has not been uploaded to the web page, it is waiting to be reflected.

The Headlines

Editor’s pick of the highlights from the past week.

Kubernetes Podcast from Google: Crossplane, with Daniel Mangum

CNCF sandbox project Crossplane lets you automate creation of infrastructure using Kubernetes APIs. Daniel Mangum is a Crossplane maintainer working at its creator Upbound, a TL of Kubernetes SIG Release, and a YouTube streaming star. He chats about tech with host Craig Box, who is helped this week by returning guest Ken Massada from the GKE Support team.

CNCF TOC votes to move Flux from Sandbox to Incubation

The CNCF Technical Oversight Committee (TOC) has voted to promote Flux from the CNCF Sandbox to an incubating project. The Flux project provides a complete Continuous Delivery (CD) platform on top of Kubernetes, supporting standard practices and tooling in the ecosystem.

  • As mentioned above, after the TOC approved the promotion of the “Flux” project from Sandbox to Incubation, this article also introduces the outline of Flux and the feature roadmap .
  • The easy-to-understand index “Notable Milestones” is as follows.
    ○ 14 maintainers from 5 organizations
    ○ More than 40k contributions
    ○ Over 10k GitHub Stars
    ○ 1894 contributors

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Helm 2nd Security Audit
Helm project

  • As the title suggests, a second audit was conducted to investigate the Helm client source code and the threat model for using Helm.
  • The initial audit focused on the Helm client source code and the process Helm uses to handle security.

Ask an OpenShift Admin (Ep 21): Etcd — the heart of the control plane
Andrew Sullivan, Chris Short, Anandnatraj Chandramohan, Red Hat

  • Approximately 1 hour session digging into etcd with a focus on performance requirements, data protection and regular maintenance.

Meet Brigade 2
Kent Rancourt, Brigade

  • It introduces the release of v2.0.0-alpha.1 of “Brigade”, a tool for creating pipelines of Kubernetes, along with a tutorial.

Top 20 Dockerfile best practices
Álvaro Iradier, Sysdig

  • As the title suggests, 20 best practices are explained by allocating the following 5 tips.
    ○ Avoid unnecessary privileges.
    ○ Reduce attack surface.
    ○ Prevent confidential data leaks.
    ○ Others.
    ○ Beyond image building.

Breaking down and fixing etcd cluster
Andrei Kvapil

  • The etcd version of “Breaking down and fixing Kubernetes” mentioned in DEVOPS WEEKLY ISSUE #531.

Hierarchical resource quotas come to GKE

  • There are Japanese and other languages versions that introduce “hierarchical resource allocation”.

10 Kubernetes security context settings you should understand
Eric Smalling and Matt Jarvis, Synk

  • As the title suggests, the following 10 securityContext settings are confirmed, and their meanings and usage are explained.
    ○ runAsNonRoot
    ○ runAsUser / runAsGroup
    ○ seLinuxOptions
    ○ seccompProfile
    ○ privileged / allowPrivilegeEscalation
    ○ capabilities
    ○ readonlyRootFilesystem
    ○ procMount
    ○ fsGroup / fsGroupChangePolicy
    ○ sysctls

Goodbye minikube (Hello KinD)
Nicolas Fränkel

  • The story of the author who was using minikube as a local cluster and switched minikube to kind when he failed in the demo.

Jetstack Preflight
Automatically perform Kubernetes cluster configuration checks using Open Policy Agent (OPA)

  • The GitHub page of “Preflight”, a tool that uses OPA (Open Policy Agent) policies to verify that the Kubernetes cluster is configured correctly mentioned in DEVOPS WEEKLY ISSUE #477.

NetworkPolicy Editor: Create, visualize, and share Kubernetes NetworkPolicies
NetworkPolicies Cilium

  • Introducing “Network Policy Editor”, a tool that supports the creation of YAML files for Kubernetes Network Policy mentioned in KubeWeekly #251. At that time, I couldn’t use it due to an error when I moved it on the vertical monitor, but I confirmed that it can be used on the main display.

CNCFMinutes 1 — OPA (Open Policy Agent)
Saiyam Pathak

  • A YouTube video that gives an overview of OPA in about 2 minutes.

Simplifying object storage as a service with Kubernetes and MinIO’s operator
Daniel Valdivia, MinIO

  • The explanation is given with CLI and figures according to the title. It recommends that you try the MinIO operator yourself to explore other great features such as using Prometheus metrics and audit logs, and protecting MinIO tenants with external identity providers such as LDAP / Active Directory and OpenID providers.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Deploying K3s at the edge for multiplayer gaming
Marco Mancini, OpenNebula

  • A 30-minute session explaining how to easily deploy a K3s cluster to the edge of a multiplayer game using OpenNebula, Firecracker, and Agones.

Kubernetes Community Days: Ask me anything
Bill Mulligan, CNCF

  • Following the relaunch of CNCF’s “Kubernetes Community Days program”, a one-hour session inviting Mr. Bill Mulligan, Marketing Manager of CNCF, to conduct Q&A.

The Editorial

Articles, announcements, and more that give you a high-level overview of challenges and features.

A look inside the KubeCon + CloudNativeCon schedule selection process
CNCF Staff Blog Post

  • As part of its commitment to transparency within the cloud-native community, it publishes findings inside the work behind the scenes to realize the KubeCon + Cloud NativeCon schedule.
  • The numbers show that the adoption of CFP is a narrow gate.

Kubernetes Infrastructure: know the inner dev loop
Vignesh T.V., The New Stack

k0s 0.11 released

  • With the release of k0s version 0.11, the focus is on upgrading the new zero downtime cluster, which is the highlight of this version.

Google Summer of Code 2021 mentoring organizations announced

  • As the title suggests, the Google Summer of Code will open this year too with the following dates:
  • Student applications will open on Monday, March 29, 2021 at 19:00 UTC and the deadline to submit your application is Tuesday, April 13, 2021 at 19:00 UTC.

Harbor 2.2 and 2021 roadmap

  • The following are listed as additional features in the Harbor v2.2 release.
    ○ Multi-projects scoped robots
    ○ Prometheus-driven telemetry
    ○ Proxy caching capability extended to GCR,, ECR, & AC
    ○ OIDC auth admin group support, achieving parity with LDAP auth
    ○ Dell EMC ECS S3 storage support
    ○ Aqua CSP Enterprise Scanner Integration
    ○ Clair image scanner deprecated
  • The articles list and explain the following.
    ○ System level robot accounts
    ○ Prometheus integration
    ○ Deprecation of Clair
    ○ 2020 in review
    ○ Focus for 2021
    ○ Contributors to v2.2
    ○ Collaborate with the Harbor Community

47 things to become a Kubernetes expert
Yamamoto Hirotaka

  • Many people have seen the Japanese version of this article in my Twitter timeline and the English version was taken up here. It looks amazing.

Kubernetes is not just about containers — It’s about the API
Viktor Farcic, The New Stack

The mindset shift needed for Kubernetes adoption (Part 1)
Archana Chillala, Krishnaswamy Subramanian, and Sunit Parekh, ThoughtWorks

  • As the title suggests, the article in Part 1 of the series. It is necessary to follow two important viewpoints, the viewpoint of the organization and the viewpoint of the development team, and Part 1 explains the following four Cs from the viewpoint of the organization.
    ○ Culture
    ○ Complexity
    ○ Capability
    ○ Costs
  • In Part 2, it seems that it will look at the necessary mind shift from the perspective of the development team.

Tetrate, a company born out of Istio’s open-source app networking project, raises $40 million
Jonathan Shieber, TechCrunch

  • As the title suggests, news about Tetrate ‘s funding. From the engineer’s point of view, I mainly read the following sentences.
    ○ The company said it would use the cash to further develop its hybrid cloud application networking platform and support a new product, based on Istio, that makes the application service mesh easier to use, according to a statement from the company.

Snyk raises $300 million at a $4.7 billion valuation
Jonathan Shieber, TechCrunch

  • Here’s Snyk ‘s funding news. Personally, I often see Synk’s presentations and materials, but I haven’t heard stories about them in Japan, so I read the following information.
    ○ So far, that suite of services has meant more than 27 million developers around the world are using Snyk tools and the company also provides a marketplace for security coders to pitch their own tools on the Snyk platform.

NetApp brings data portability to Kubernetes apps with Astra
Mike Wheatley, Silicon Angle

  • An article from data storage giant NetApp Inc. new service, NetApp Astra, that reaches GA and realizes the vision of portability for Kubernetes applications.

4 best practice steps for Kubernetes policy enforcement
Robert Brennan, The New Stack

  • It describes the importance of Kubernetes policies and the following four best practices for applying them.
  1. Understand Your Kubernetes Strategy
  2. Create Kubernetes Policies
  3. Enforce Those Policies
  4. Use Policy Enforcement to Gain Multicluster Visibility

Upcoming CNCF Online Programs

Data protection in a Kubernetes native world
Michael Cade @Kasten

March 16, 2021 at 10am PT
Register Now

Cloud Native Live: Hacking Kubernetes
Ben Hirschberg @ARMO

March 17, 2021 at 12pm PT
Register Now

Your own Kubernetes castle
Adam Kozlowski @GrapeUp

March 18, 2021
Register Now

CNCF Online Programs Playlist on YouTube

Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #CKA, #CKAD, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store