SRE / DevOps / Kubernetes Weekly Collection#35(Week 40)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #509 September 27th, 2020
SRE Weekly Issue #237 September 27th, 2020
KubeWeekly #235 October 2nd, 2020
DEVOPS WEEKLY ISSUE #509 September 27th, 2020
News
- The title is “Rebuilding Linkerd’s continuous integration (CI) with Kubernetes in Docker (kind) and GitHub Actions”.
- An article transcribing his presentation at KubeCon EU 2020.
- Two demo videos are embedded, which is especially helpful when you want to assemble CI / CD with OSS.
- The title is “The evolution of DevOps and why we are here”.
- An article that explains the role of system administrators, how they needed to change their responsibilities to what they call today’s DevOps engineers over time, and what they are today.
- The title is “Fake COTS “and the one-day rule”.
- When a government agency procures commercial-off-the-shelf (COTS), it can be used immediately as expected, and a product that loses its name and cannot be used immediately in one day is called “Fake COTS”. It mentions the product name as an example.
- The following disclaimer is strongly stated at the beginning.
- Extra-prominent disclaimer: The views expressed here are my own. Products mentioned in the examples below are not endorsements.
- The title is “Watchman: monitoring dependency conflicts for Python library ecosystem”. A blog that explains recent papers.
- It explains the prevalence of dependency conflicts in Python projects and their causes, touching on “Dependency Hell”.
- The title is “Terraspace All: Deploy Multiple Stacks or Terraform Modules At Once”.
- It introduces Terraspace , the framework of Terraform. It provides conventions from organized structures and settings, keeps the code DRY, and adds useful tools.
- A video is also embedded.
- The title is “AWS Account Structure: Think twice before using AWS Organizations”.
- It explains the transition and points of AWS account management methods. The podcast is also embedded.
- The title is “Using Custom Plugin Indexes”.
- It explains how to use the “Custom Plugin Index” with some kubectl krew commands.
Tools
- The GitHub page of the Kubernetes utility “Tversion-checker” for monitoring the current version of images running in a cluster and the latest upstream available one.
- These checks are published as Prometheus metrics displayed on the dashboard, or as soft alert cluster operators.
- This tool is currently experimental.
- The io page of “Portus”, an open source authorization service and user interface for the next generation Docker container registry.
- An on-premise app that allows users to manage and protect the Docker container registry.
SRE Weekly Issue #237 September 27th, 2020
Articles
Postmortem — why Allegro went down
They fully expected their deep-discount sale to drive traffic, but they didn’t expect their system to handle the increase in the way that it did.
Michał Kosmulski — Allegro
- An article dated August 31, 2018. Allegro’s post-mortem on an outage where the website went down for 20 minutes at noon on July 18, 2018.
- It is based on an internal post-mortem, it publishes and explains how the failure occurred for customers and the tech community, and what technical steps were taken to prevent similar events in the future.
Zero-Downtime Kubernetes Deployments
Pre-stop hooks, liveness probes, and readiness probes were key to smoothly transitioning their services from a home-grown container system to Kubernetes.
Oliver Leaver-Smith — Sky Betting & Gaming
- Sky Betting & Gaming describes the work that it has done over the past few months to move the OIDC / OAuth2 ID service from a tactical container platform to an on-premises Kubernetes cluster.
Feelings during incident response
The experience of responding to an incident can evoke emotions that run the gamut.
Mads Hartmann
- It excerpts and explains the part of the Glitch podcast “ Shift Shift Forward “ that was asked about “Feelings during incident response”.
Join SRE Classroom NALSD workshops
Google has released course materials the first of a series of classes on NALSD (“non-abstract large systems design”). This first one is about a distributed Pub-Sub system.
Auithor: Jenny Liao and Salim Virji — Google
- Google has introduced the first workshop “ NALSD (Non-Abstract Large System Design) “ of “Distributed Pub / Sub workshop” in SRE Classroom.
Why you should write up your own incident
Usually, doing a post-analysis on an incident you were in is an anti-pattern because you’re likely to introduce bias. But sometimes, it can lead you to learn more than you would have otherwise.
Lorin Hochstein
- “Why you should write up your own incident” is explained from the author’s latest experience.
- He states that the person responding to the incident should avoid conducting a post-analysis, but it is unavoidable if there is a need. In addition, he recommends that the person with the incident perform the post-analysis based on his own experience of new discoveries in the post-analysis of the incident that he recently responded to.
- You shouldn’t write up your own incident if you can avoid it. To write up an incident well, you need to be able to capture the perspectives of the different people who were involved. If the write-up author was also one of the responders, then the writeup will be biased towards their perspective, at the expense of capturing the perspectives of the other engineers who were engaged.
Outages
- Datadog
- G Suite
- Google Cloud Platform
- Let’s Encrypt
Google CT logs had an issue, impairing Let’s Encrypt’s ability to issue. - Tesla
- Apple
- Heroku
- Connectivity Issues
- Crypto.com (cryptocurrency exchange)
The CEO says a database issue (nearly) opened up the possibility for arbitrage.
KubeWeekly #235 October 2nd
The Headlines
Editor’s pick of the highlights from the past week.
KubeCon + CloudNativeCon North America 2020 Virtual — schedule now available!We’re so excited to announce that the schedule for KubeCon + CloudNativeCon North America 2020 Virtual is live! The fourth virtual event from CNCF this year will host ~200 maintainer sessions, tutorials, keynotes, and breakout sessions, including insights from end users on cloud native technology in production. This educational event will arm attendees — from beginner to advanced — with the insights they need to successfully implement and manage cloud native architectures within their organization. Don’t forget that you can save $25 when you register by the end of October!
- It announces KubeCon + CloudNativeCon North America 2020 Virtual schedule release and reminds us of its registration. The Paid participation is $ 75, a $ 25 discount during October, and $ 100 for November applications.
- It’s already next month. I will choose the sessions to watch in advance.
ICYMI: CNCF Webinars
You can view all CNCF recorded and upcoming webinars here.
CNCF Project webinar: Kubernetes 1.19
Kubernetes release team
- The changes in Kubernetes 1.19 are explained based on the statuses “Alpha”, “Beta”, “Stable”, etc. for each SIG. Support period is one year from 1.19.
CNCF Member webinar: VanillaStack as a platform for a truly vendor-agnostic open-source ecosystem
Karsten Samaschke, CEO @Cloudical
- It introduces the open source version of VanillaStack and its underlying ideas, explains the ideas behind the platform, and provides a future roadmap for the integration and deployment of open source projects.
CNCF Member webinar: Effective disaster recovery strategies for Kubernetes
Rasheed Amir, CEO tStakater AB
- It describes how companies are leveraging Kubernetes through DevOps for mission-critical cloud-native apps with the following points.
○ Some concepts and terms to consider for disaster recovery business needs
○ Kubernetes architecture for ensuring fault tolerance and high availability
○ Factors to consider while creating a Disaster recovery plan
○ The components for which to implement backup and restore
CNCF Member webinar: Self service Kubernetes for enterprises
Jim Bugwadia, Founder and CEO @Nirmata
- It describes best practices and new patterns that can help you achieve a self-service Kubernetes cluster across your enterprise.
- For platform teams that require visibility and governance, enable enterprise-wide business agility, and drive adoption of cloud-native tools.
CNCF Member webinar: Dapr, Lego for microservices
Mark Chmarny, Principal Program Manager @Microsoft
It introduces how to use the distributed application runtime Dapr to efficiently build cloud-native apps deployed on Kubernetes and other hosting platforms.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
The Level Up Hour (podman play kube)
Langdon White and Chris Short, Red Hat
- The Podman Twitch video program “The Level Up Hour” by Chris Short of Red Hat, the editor of Kube Weekly, and with Langdon White of the company as a guest.
Our online analytical processing journey with ClickHouse on Kubernetes
Sudeep Kumar, Mohan Garadi, Xiancheng Li, Amber Vaidya and Liangfei Su, eBay
- It describes the latest evolution of online analytical processing(OLAP) data under the theme of Click House(column-oriented database) using Kubernetes.
A Linux sysadmin’s introduction to cgroups
Steve Ovens, Red Hat
- The first article in a four-part series. It describes the definitions of cgroups and how they can help with resource management and performance tuning.
Rabbitmq monitoring is in the Governments
Piotr Minkowski
- It explains how to run a monitoring stack in Kubernetes using RabbitMQ.
- You can use the RabbitMQ monitoring tool to see general metrics of the nodes and detailed logs of all messages.
- Spring Boot AMQP provides application-specific metrics that interact with RabbitMQ.
Build a data streaming pipeline using Kafka Streams and Quarkus
Kapil Shukla, Red Hat
- It builds and describes Quarkus apps that use KafkaStreams to stream and process data in real time.
Chaos Mesh 1.0: Chaos Engineering on Kubernetes made easier
Chaos Mesh Maintainers
- The announcement of the release of v 1.0 GA of “Chaos Mesh®”, which joined CNCF as a sandbox project in July 2020, and it introduced the outline of this project.
Rootless containers with Podman: The basics
Prakhar Sethi, Red Hat
- It shows the benefits of using containers and Podman, the rootless containers and why they are important, and examples of how to use rootless containers with Podman.
Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow
- The GitHub page of “Couler”, an integrated interface for building and managing workflows in various workflow engines such as Argo Workflow, Tekton Pipeline, and Apache Airflow. The following is provided.
○ Simplicity: Unified interface and imperative programming style for defining workflows with automatic construction of directed acyclic graph (DAG).
○ Extensibility: Extensible to support various workflow engines.
○ Reusability: Reusable steps for tasks such as distributed training of machine learning models.
○ Efficiency: Automatic workflow and resource optimizations under the hood.
Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics
- I will skip it because it is covered in DEVOPS WEEKLY ISSUE # 509 above.
TiDB Operator: Your TiDB operations expert in Kubernetes
Aylei Wu, PingCap
- It explores how TiDB Operators can run TiDB smoothly on Kubernetes and ensure data security, and explain how companies use TiDB Operators in production and best practices.
Pooja Dhoot
- It shares a workflow for creating a pipeline for deploying fission on a GKE Kubernetes cluster created by GitHub Actions, some code validation actions, and finally some monitoring actions.
Use Terraform to create and manage a HA AKS Kubernetes cluster in Azure
Kentaro Wakayama, Coder Society
- It explains how to use Terraform to manage a highly available Azure AKS Kubernetes cluster with Azure AD integration and Calico network policies enabled.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Contributing to the Development Guide
Erik L. Arneson
- A new contributor describes his experience writing and submitting changes to the Kubernetes Development Guide.
- I’m also starting to get involved in the localization of web page documentation, but I’d like to expand the scope.
Anthos in depth: Easy load balancing for your on-prem workloads
Mahesh Narayanan, Product Manager, GKE and Yuan Liu, Software Engineer, GKE
- It introduces three different options that Anthos offers to deploy an external load balancer and details the load balancer bundled with Anthos.
Kubernetes: When to use, and when to avoid, the operator pattern
Mary Branscombe, The New Stack
- It quotes Rancher Labs Chief Technology Officer Darren Shepherd’s tweet and explores the usage and points of the Operator.
Leader Election, with Mike Danese
Adam Glick and Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
- It welcomed Google’s SWE, Kubernetes SIG Auth chair, and TL Mike Danese as a guest.
- The topics I was interested in in the News of the week are as follows.
○ OpenServiceMesh joins the CNCF Sandbox
○ Chaos Mesh 1.0
○ Determined AI in Kubernetes
○ KubeAcademy Pro from VMware
○ Scholarships for KubeCon NA 2020 are open for application
Electro Monkeys podcast (in French)
- The French podcast “Electro Monkeys podcast” seems to cover the mechanism by which Falco detects unwanted behavior.
The Cloud Native Landscape: The runtime layer explained
Catherine Paganini and Jason Morgan
- A series of articles explaining each category of CNCF’s “Cloud Native Landscape”. It focuses on a runtime layer that covers everything a container needs to run in a cloud-native environment.
CNCF Case Study
- An article introducing user cases by CNCF. It covers the adoption of Verizon Media’s cloud-native technology.
- He owns Yahoo, HuffPost, TechCrunch, and many other brands. Click here for the full text of the case study.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Member Webinar: Multi-Cluster & multi-cloud service mesh with CNCF’s Kuma and Envoy
Marco Palladino, CTO & Co-Founder @Kong
Oct 6, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: The evolution of cloud orchestration systems from ephemeral to persistent storage
Boyan Krosnov, CPO @StorPool
Oct 7, 2020 8:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Kubernetes native two-level resource management for AI/ML workloads
Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research
Oct 7, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Building dynamic machine learning pipelines with KubeDirector
Tom Phelan, Fellow, Software Organization @Hewlett Packard Enterprise
Oct 8, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: You can be a Kubernetes contributor too!
Jeremy L. Morris, Software Engineer @DigitalOcean
Oct 13, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: ephemeral.run: A full application environment for every PR–before you merge to master!
Vishal Biyani, CTO @InfraCloud
Jono Spiro, Staff Software Engineer, Engineering Operations @OpenGov
Oct 14, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: GitOps at scale for a multicloud, multi-region stateful application
Rick Spencer, Head of Platform @InfluxData
Oct 14, 2020 1:00 PM Pacific Time
REGISTER NOW »
Member Webinar: S&P experience report: multi-cloud serverless on Knative
Evan Anderson, Software Engineer @VMware
Mark Wang, Head of Cloud Engineering @S&P Global Ratings
Oct 15, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Delivering cloud native apps to Kubernetes using werf
Dmitry Stolyarov CTO, @Flant
Oct 16, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: How to migrate NF or VNF to CNF without vendor lock-in
Grzegorz Sikora, VP Business Development @OVOO
Oct 20, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Deploying Kubernetes to bare metal using cluster API
Seán McCord, Principal Senior Software Engineer @Talos Systems, Inc.
Oct 21, 2020 1:00 PM Pacific Time
REGISTER NOW »
Member Webinar: K8s audit logging deep dive
Randy Abernethy, Managing Partner @RX-M
Oct 22, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer @Lenses.io
Francisco Perez, Senior Backend Engineer @Lenses.io
Oct 23, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Admission controllers: one part of your Kubernetes security and governance toolkit
Gunjan Patelm, Cloud Architect @Palo Alto Networks
Robert Haynes, Cloud Security Evangelist @Palo Alto Networks
Oct 28, 2020 7:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time
REGISTER NOW »
Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time
REGISTER NOW »
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!