SRE / DevOps / Kubernetes Weekly Collection#54(Week 6, 2021)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #528 February 7th, 2021
SRE Weekly Issue #256 February 7th, 2021
KubeWeekly #250 February 12th, 2021

DEVOPS WEEKLY ISSUE #528 February 7th, 2021

News

A detailed post on the Bottlerocket build system. You may not have quite as complex a project, but lots of interesting tricks in here for using Cargo for much more than just building Rust code.

  • The title is “How the Bottlerocket build system works” from the AWS Open Source Blog.

A post on how to best defend your software build pipeline from targeted supply chain attacks.

  • The title is “Defending software build pipelines from malicious attack”.

Architecture diagrams often feature lots of boxes and arrows, but how do you overlay more useful information without visual overload? This post provides a handy visual language.

  • The title is “A visual language for digital integration”.

Dockerfiles are ubiquitous for building container images. But if you’re looking for something that provides a higher level interface and stronger opinions then buildpacks are worth a look. This post compares the two.

  • The title is “Build packs vs Docker files”.

Threat modelling is a useful tool for getting people thinking about the security of their systems. It’s also a great way of encouraging collaboration between development and security teams. This new manifesto is a good starting point.

  • The web page of “Threat Modeling Manifesto”.

Ever wanted to understand how Kubernetes allocates IP addresses when you run a high-level command like kubectl expose? This post has you covered.

gRPC is optimised for fast, secure over-the-wire transfer. But that makes it harder to debug than something like JSON over HTTP. Here’s how to use Wireshark for analyzing gRPC messages.

  • The title is “Analyzing gRPC messages using Wireshark”.

A case study for building a Kubernetes-powered CI/CD pipeline using GitLab and Helm.

  • The title is “Building a Kubernetes CI / CD Pipeline with GitLab and Helm”.

Tools

Vorteil provides a super interesting toolkit for building and running fast micro-VMs. You can even convert an OCI-compliant container image directly to a VM and run it using Vorteil.

  • A GitHub page of the operating system “Vorteil” for running cloud applications in micro virtual machines.

Kubenav provides desktop, web and mobile apps for monitoring the status of a Kubernetes cluster.

  • A GitHub page of the mobile, desktop, and web app “kubenav” for managing Kubernetes clusters and getting an overview of resource status.

SRE Weekly Issue #256 February 7th, 2021

Articles

Slack’s Outage on January 4th 2021

Here’s a blog post from Slack giving even more information about what went wrong on January 4. Bravo, Slack, there’s a lot in here for us to learn from.

Laura Nolan — Slack

  • Regarding the outage, the article I covered in SRE Weekly Issue #254 the other day was a Slack report, but this one is in Slack’s engineering blog.

Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website

This academic paper from Facebook explains how they release code without disrupting active connections, even for a small number of users.

Usama Naseer, Luca Niccolini, Udip Pant, Alan Frindell, Ranjeeth Dasineni, and Theophilus A. Benson — Facebook

  • An abstract page of Facebookpaper. You can download the paper from the link.

Brand SRES

Another lesson we can learn from aviation: have one place where engineers can find out about temporary infrastructure changes that are important.

Bill Duncan

  • It explores ways to communicate effectively with the entire team in a many-to-many condition where many environments are dealt as SRE with many other team members, and the temporary state of each environment. It uses the aviation term “NOTAM (Notices to Airmen)” as the keyword to explain the situation.

Incident Post Mortem: January 29, 2021 [Coinbase]

Coinbase posted this detailed analysis of their January 29th incident.

Coinbase

  • It details the outage, explains what caused it, and describes changes to prevent similar failures in the future.

Council Post: How Cloud Services Platform Teams Can Drive The Adoption Of Effective SRE Practices

Interesting thesis: a company moving into the cloud is in a unique position to adopt SRE practices — and better situated than cloud-first companies.

Tina Huang (CTO, transpose) — Forbes

  • Along the title, it explains that there are two contrasting approaches that legacy software companies can adopt in their cloud strategies, including significantly different SRE outcomes.
  1. Adopt Cloud Services For Individual Services And Teams

“I’m Just Doing my Job,” An SRE Myth

We need to push past surface-level mitigation of an incident and really dig in and learn.

Darrell Pappa — Blameless

  • The author, who heard the lines of the title from the person in charge of the customer consultation desk, suggested that SRE should be from the customer’s perspective as follows, and that the problems should be systematized and SRE best practices should be applied.

GitHub Availability Report: January 2021

GitHub’s database failed in a manner that wasn’t detected by their automated failover system.

Keith Ballinger — GitHub

  • It describes one incident and its countermeasures that caused a significant impact and reduced availability of the GitHub Actions service that occurred in January.

Open source update: School of SRE

LinkedIn published their SRE training documentation in the form of a full curriculum covering a range of topics.

Akbar KM and Kalyanasundaram Somasundaram — LinkedIn

  • Introducing the School of SRE, a curriculum curated for ambitious SREs published by LinkedIn on GitHub.

Push some big numbers through your system and look for bugs

Your code may be designed to handle 64-bit integers, but what if a library (such as a JSON decoder) converts them to floating point numbers?

rachelbythebay

  • It introduces how to search for bugs and play with JSON.

KubeWeekly #250 February 12th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Last chance to register for KubeCon + CloudNativeCon Europe 2021 — Virtual for $10!

KubeCon + CloudNativeCon Europe 2021 Virtual is happening on May 4–7, 2021. Be sure to register for a full All Access Pass for just $10 through February 14 at 23:59 CEST! The price will increase to $75 on February 15, so act fast to take advantage of this great deal.

Don’t forget — the CFP deadline for KubeCon + CloudNativeCon Europe 2021 Virtual co-located events closes on February 19!

See the full list of co-located events below:

Cloud Native Rust Day– hosted by CNCF — May 3
Cloud Native Security Day Europe — May 4
Cloud Native Wasm Day — May 4
FluentCon Cloud Native Logging day with Fluent Bit & Fluentd — May 4
Kubernetes AI Day — May 4
Kubernetes on Edge Day — May 4
ServiceMeshCon Europe — May 4

  • KubeCon + CloudNativeCon Europe 2021 — Virtual and Co-located events scheduled to be held on May 4–7, 2021. All Access Pass for $ 10 is up to 2/14. I have already applied for it and I’m considering which one to apply for Co-located events.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Getting started with Kubernetes audit logs and Falco

Pawan Shankar, Sysdig

  • It describes what Kubernetes audit logs are, the information they provide, and how to integrate them with the open source runtime security tool “Falco” to detect suspicious activity in your cluster.

Building container images in Go

Ahmet Alp Balkan

  • It explains how to build an OCI container image without using Docker by programmatically building the layers and image manifests using the go-containerregistry module.

Cloud Development Environments: Using Skaffold and Telepresence on Kubernetes for fast dev loops

Peter O’Neill, Ambassador Labs

  • It explains how to use Skaffold to build and deploy a local environment, launch Telepresence, project the local services you are building to a remote cluster, and loop through development.

Achieving Cloud Native Security and Compliance with Teleport

Ninad Desai, InfraCloud

  • It touches on the need for Zero Trust Architecture and introduces “Teleport” as a product that fits into the area of ​​”Zero Trust Network” for cloud-native apps.

Kubernetes Liveness Probes — Examples & Common Pitfalls

Levent Ogut, Loft

  • The Readiness Probe and Liveness Probe, which it described in a previous post, mention that they behave differently and explain each component, configuration, and how to troubleshoot.

Let’s Learn Harvester

Saiyam Pathak, Civo

  • It introduces an open source hyper-converged infrastructure (HCI) software running on Kubernetes. It is explained as an open source product alternative to products such as vSphere and Nutanix.

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

How to Manage Kubernetes Application Lifecycle Using Carvel

Helen George and Joao Pereira @VMware

  • It introduces Carvel, an open source project that provides a reliable, single-purpose, configurable set of tools to help you build, configure, and deploy your apps to Kubernetes.

Debugging Kubernetes On The Fly

Josh Hendrick @Rookout

  • It describes what traditional challenges are when debugging Kubernetes-based apps, and how real-time debugging of production workloads can help solve them.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

The State of Cloud Native Application Security survey — 2021

Matt Jarvis, Snyk

  • It introduces Snyk’s Cloud native application security (CNAS) 2021 survey and shares plans for its report.

Garden: The Configure-Once Kubernetes Platform for Seamless Dev/Prod Integration

Thor Sigurdsson & Mike Winters, Garden

  • It describes that “Most of the problems developers run into CI are caused by a) discrepancies between dev and CI environments and b) insufficient, slow integration testing.”, and one possible approach to solve these problems is to use a consistent configuration for every pre-production environment, from development to testing to CI.
    In this context, it introduces “Garden” which uses a consistent configuration for all pre-production environments, from development to testing to CI.

Upcoming CNCF Online Programs

CNCF Live webinar: Toward Hybrid Cloud Serverless Transparency with Lithops Framework
presented by IBM
February 16, 2021 at 10:00 am PT
Register Now

This Week in Cloud Native (Livestream): KCD El Salvador
February 17, 2021 at 12:00 pm PT
Register Now

CNCF Online Programs Playlist on YouTube
Check out
our playlist for more curated content you don’t want to miss! New content is added every Friday.

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store