SRE / DevOps / Kubernetes Weekly Collection#53(Week 5, 2021)

Yoshiki Fujiwara
11 min readFeb 8, 2021
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #527 January 31st, 2021
SRE Weekly Issue #255 January 31st, 2021
KubeWeekly #249 February 5th, 2021

DEVOPS WEEKLY ISSUE #527 January 31st, 2021

News

A detailed writeup and timeline of a security incident. It’s important to learn from events like this, and the post finishes up with some concrete recommendations.

  • The title is “A deeper dive into our May 2019 security incident”.
  • It shares this because they can explain in more detail as a result of discussions with law enforcement agencies over time; what they did for addressing the underlying issues that caused the security incident in May 2019, what happened, how it happened.

A discussion of various facets of building reliable systems. From SLOs to runbooks, service catalogues to feature flags and more.

  • The title is “2021 is the Year of Reliability”.
  • An overview of each expectation for software in 2021 and how to achieve that expectation is given along with the following items.
    ○ Customers want reliable software.
    ○ Operators want reliable software.
    ○ How do we achieve reliable reliability?
    ○ SLOs
    ○ Every day is a chance to be more reliable.

A post describing the high-level security architecture for a web site. What makes it interesting is the focus on proportionality, being as secure as needed rather than as secure as possible,

  • The title is “Securing the NCSC’s web platform”.
  • The UK’s NCSC(National Cyber ​​Security Center) explains the points of the Web page that it operates according to the following items.
    ○ As secure as necessary
    ○ The cost of security controls
    ○ Sensible security architecture
    ○ The web platform will never be ‘done’
    ○ The balancing act is hard
  • Proportionate risk management, usability, functionality, cost, Game Days, and other perspectives and ideas that seemed to be more advanced and flexible than the impression given by the organization name were lined up.

A good post on the risks associated with permissive permissions and privilege escalation with Kubernetes pods.

  • The title is “Bad Pods: Kubernetes Pod Privilege Escalation”.
  • It describes the following eight insecure pod configurations and the corresponding ways to perform privilege escalation.
    ○ Bad Pod #1: Everything allowed
    ○ Bad Pod #2: Privileged and hostPid
    ○ Bad Pod #3: Privileged only
    ○ Bad Pod #4: hostPath only
    ○ Bad Pod #5: hostPid only
    ○ Bad Pod # 6: hostNetwork only
    ○ Bad Pod #7: hostIPC only
    ○ Bad Pod #8: Nothing allowed
  • This article and accompanying repositories have been created to help penetration testing testers and administrators better understand common misconfiguration scenarios.

Why does it take so long to build software? Lots of observations in this post, about accidental complexity, about increasing demands vs 10 or 20 years ago on several fronts, on the rise of frameworks and tools that solve problems you might not have and more.

  • The title is “Why does it take so long to build software?”.
  • The contents of the theme are explained according to the following items.
    ○ Different types of complexity? That’s complex.
    ○ Here comes the accidental complexity.
    ○ How does this apply to software?
    ○ We are asking more and more of our software.
    ○ The volume of software within companies is exploding.
    ○ The pace of new technology adoption is increasing.
    ○ Is there hope?
  • In a future post, it is going to discuss the impact of accidental complexity on software projects, and how they can more effectively avoid it while ensuring they are still meeting the needs of the business.

Hex is the package manager for Erlang and Elixir. A new feature, called Hex Preview, allows for checking the contents of source files from specific versions of packages. An important use case that’s easy to miss with the growth of supply chain attacks.

  • The title is “Introducing Hex Preview”.
  • As the title suggests, it introduces the release of the online tool “Hex Preview” for viewing the source files of the Hex package.

An example of writing unit tests for Helm charts using Go.

  • The title is “How to unit-test your helm charts with Golang”.
  • I will skip it because it was covered in KubeWeekly#248 last week.

Events

An event all about the low-level bits of containers. Container runtimes, image building, image scanning, container security and isolation, virtualization inside containers, etc. Taking place March 9th/10th, with the CFP submissions due by the 10th of February.

  • It introduces “The Container Plumbing Days”, a two-day ‘lower-level’ open source container technologies event.
  • The schedule is as follows.
    ○ Tuesday March 9th, 2021, 15:00 to 19:00 UTC (10am to 2pm Eastern)
    ○ Wednesday, March 10th, 2021, 15:00 to 19:00 UTC (10am to 2pm Eastern)
  • The main projects and technologies expected by the events listed in About are as follows.
    ○ Buildah, CRI-O, Katacontainers, Kubevirt, Clair, Skopeo, Cgroups2, Krustlet, Seccomp, Podman, KIND, Tern, and many others.

Tools

Simdjson is a JSON parsing library that aims to make parsing gigabytes of JSON per second trivial. Interesting design and API, and bindings available in lots of languages.

  • A GitHub page of the library “simdjson” that parses JSON at high speed.

Etok provides a nice user interface for running Terraform on Kubernetes. Avoid needing local credentials or access to APIs. Some handy integration with GCP as well.

  • A GitHub page of the tool named “Etok”, which stands for Execute Terraform On Kubernetes.
  • The “Why” in the README is as follows.
    ○ Leverage Kubernetes’ RBAC for terraform operations and state
    ○ Single platform for end-user and CI/CD usage
    ○ Queue terraform operations
    ○ Leverage GCP workspace identity and other secret-less mechanisms
    ○ Deploy infrastructure alongside applications

Litestream is a standalone streaming replication tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file or S3.

  • A GitHub page of “Litestream”, a standalone streaming replication tool for SQLite.
  • It runs as a background process and safely replicates changes to another file or S3 in stages. Litestream communicates with SQLite only via the SQLite API, so the database will not be corrupted.

SRE Weekly Issue #255 January 31st, 2021

Articles

Why It Should Be Service, Not Site Reliability

It really should! Even Google is much more accurately described as a “service” than a “site”.

Chris Riley — Splunk

  • S of SRE stands for “site”, but it argues that it should be a “service” that is more consistent with what developers offer today, along with the following points:
    ○ Subscription-based business models
    ○ Application architectures
    ○ Modern delivery chain
    ○ Cross-platform
    ○ Customer-centric

Migrations: the sole scalable fix to tech debt.

There are migrations, and then there’s the time between migrations.

Will Larson

  • The title tells the story of Uber moving from a Puppet managed service to a fully self-service provisioning model.
  • I personally felt it is an important point in this to reduce the cost of migration by slowing down the migration time after confirming that the verification can solve the intended problem at the time of migration.

2021 is the Year of Reliability

2020 was the year mainstream folks realized how important reliability is. Will overall reliability improve in 2021?

Robert Ross — FireHydrant

  • I will skip it, because it is covered in DEVOPS WEEKLY ISSUE#527 above.

This SRE attempted to roll out an HAProxy config change. You won’t believe what happened next…

I love this for the click-bait title and the content. An HAProxy feature designed for HA had a surprising and unexpected behavior.

Andre Newman — GitLab

  • It details what we discovered while investigating strange behavior from HAProxy.
  • TLDR is below
    ○ HAProxy has a server-state-file directive that persists some of its state across restarts.
    ○ This state file contains the port of each backend server.
    ○ If a haproxy.cfg change modifies the port, the new port will be overwritten with the previous one from the state file.
    ○ A workaround is to change the backend server name, so that it is considered to be a separate server that does not match what is in the state file.
    ○ This has implications for the rollout procedure we use on HAProxy.

Tyler Wells on building a culture of reliability at Twilio

Twilio builds customer trust through a reliability culture, customer empathy, and accountability.

Andre Newman — Gremlin

  • The following points are excerpted from a talk by Tyler Wells, Senior Director of Engineering at Twilio.
    ○ Reliability is built on customer trust
    ○ Culture
    ○ Customer empathy
    ○ Accountability
    ○ Reliability is a journey

WTF is SRE WTFinar

This WTFinar tackles the beginning of understanding SRE. It focuses on service level indicators (SLIs) and service level objectives (SLOs) — components of error budgets.

Container Solutions

  • A Container Solutions’ Webinar “WTF is SRE?” is featured.
  • It focuses on SLI and SLO as a starting point for understanding SRE.
  • As mentioned above, it will start at 23:00 Japan time because it is 2/9 (Tuesday) 15:00 CET (Central European Time zone).

Outages

KubeWeekly #249 February 5th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Welcome to our 5 new TOC members!

Chris Aniszczyk, CNCF

Help us give a warm welcome to the newest members of the TOC:

* Erin Boyd, Apple
* Cornelia Davis, Weaveworks
* Lei Zhang, Alibab
* Dave Zolotusky, Spotify
* Ricardo Rocha, CERN

Learn more about the TOC and newest members in the latest blog post.

  • CNCF’s TOC(Technical Oversight Committee) an article announcing the selection of the above five new TOC members.
  • It introduces the position of TOC and the biographies of the members appointed to the Governing Board (GB) and End User Community (EUC).
  • It also thanked the following three members who have completed their terms.
    ○ Brendan Burns (@brendandburns)
    ○ Matt Klein ( @ mattklein123 )
    ○ Xiang Li ( @xiangli0227 )

Cloud Native Computing Foundation Announces Open Policy Agent Graduation

CNCF blog

Congratulations to Open Policy Agent (OPA) for hitting graduated status! OPA has demonstrated widespread adoption, an open governance process, feature maturity, and a strong commitment to community, sustainability, and inclusivity to graduate.

  • As the title suggests, the article tells the CNCF TOC that OPA’s Maturity Level has reached Graduation.
  • OPA was accepted by the CNCF sandbox in April 2018 and was promoted to incubation a year later. More than 90 people from about 30 organizations have contributed to OPA, and maintainers consist of members from four organizations: Google, Microsoft, VMware, and Styra.
  • See CNCF Graduation Criteria v1.3 for promotion conditions as of 2021/02/06 according to Maturity Level.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Killing Containers at Scale

Connor Brewster, Replit

  • As a result of their research, it explains how to forcibly terminate the container by themselves and its effect for the problem that “Docker takes more than 30 seconds to forcibly terminate all containers on the VM”.

Kubernetes — How to Debug CrashLoopBackOff in a Container

David Giffin, Release App

  • It doesn’t explain how to properly configure k8, but instead focuses on debugging its own and other code when a “CrashLoopBackOff” error occurs in the container.

Hunting for Malware with Falco

And Lorenc

  • It explains how to build a platform to look for malicious behavior hidden behind the scenes.

Deliver your applications to edge and IoT devices in rootless containers

Ilkka Tengvall, Red Hat

  • It explains how to use systemd, Podman, and Red Hat Ansible Automation to automate software and push it as a container to small edge and Internet of Things (IoT) gateway devices.

Building a Kubernetes CI/CD Pipeline with GitLab and Helm

Dan Slapelis, Nextthink Labs

  • It explains how to use the CI/CD pipeline on Kubernetes as a puzzle, bolt the continuous delivery(CD) pieces of the puzzle, build the CI/CD pipeline, and deploy the app to Kubernetes. As a premise, It starts with the explanation of Helm, which is an important part of the puzzle.

Kubernetes vs Docker: Understanding Containers in 2021

Tomas Fernandez, semaphore

  • A few weeks ago, the Kubernetes development team announced that they would deprecate Dockershim, but the most common questions are explained from the underlying containers, Docker, and Kubernetes.
  • For those who already know about Docker and Kubernetes, I recommend skipping and reading the “ How does the Dockershim deprecation impact you? “ Section in the article.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

CNCF On-demand webinar: Policy as Code to manage security risk in K8s before & after deployment

Cesar Rodriguez @Accurics

  • It introduces the on-demand webinar with the above title. If you are interested, please register and watch. It is open to the public only for registrants, and the release period is February 4, 2021 0:00 — February 10, 2021 23:59 (PST).
  • The Kubernetes development team explains how to use open standards such as OPA (Open Policy Agent) and open source IaC scanners such as Terrascan to improve security with policy as code.

This Week in Cloud Native (Livestream): Kubernetes Policies-as-Code

Jim Bugwadia @Nirmata

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Backstage, with Lee Mills and Matt Clarke

Craig Box, Kubernetes Podcast from Google

  • Kubernetes Podcast by Google employees. The current Co-host is Craig Box. Adam Glick goes to greener pastures. Past guests will be invited as guest hosts for several weeks.
  • This week, Google’s Senior SWE, Kubernetes SIG Architecture’s co-chair, CoreDNS project’s Core Maintainer, and O’Reilly’s book “Learning CoreDNS: Configuring DNS for Cloud Native Environments” appearing in Episode#106 by John Belamaric is the guest host.
  • The guests are Lee Mills and Matt Clarke from Spotify, the maintainers of “Backstage.”
    ○ Backstage is a platform for building a developer portal using a centralized service catalog.
    ○ Open source developed by Spotify and donated to CNCF in 2020
  • The topics I was interested in in the News of the week are as follows.
    Longhorn 1.1
    Sonobuoy adds reliability scanning
    Announcing the Linkerd steering committee

Release Orchestration

Vamp.io Introduces Research Report The 2021 State of Cloud-Native

  • In line with the report content in the title, it describes challenges, trends, and opportunities for improvement regarding software release and verification in production as of 2021.
  • The study highlights the tough challenges facing small businesses and engineering plastics in their pursuit of cloud, Kubernetes, and microservices journeys.

Upcoming CNCF Online Programs

CNCF Live webinar: How to Manage Kubernetes Application Life Cycle Using Carvel
presented by VMware
February 9, 2021 at 10:00 am PT
Register Now

CNCF On-demand webinar: Debugging Kubernetes On The Fly
presented by Rookout
February 11, 2021
Register Now

CNCF On-demand webinar: Otomi Container Platform Open Source Announcement
presented by Red Kubes RV
February 11, 2021
Register Now

For more information, please visit our updated Online Programs page.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.