SRE / DevOps / Kubernetes Weekly Collection#12(Week 17)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #486 April 19th, 2020
SRE Weekly Issue #215 April 20th, 2020
KubeWeekly #213 April 25th, 2020
DEVOPS WEEKLY ISSUE #486 April 19th, 2020
News
- The title is “Incident Analysis: How Learning is Different Than Fixing”. Article dated January 31st.
- He said as The Main Gist below
* Current and typical approaches to “learning from incidents” have very little to do with actual learning.
* Learning is not the same as fixing.
* Most post-incident review documents are written to be filed, mpt written to be read.
* Changing the primary focus from fixing to learning will result in a significant competitive advantage - As previously pointed out in other articles, focusing on reducing the number of incidents will help those who reduce the number of incidents (through reporting methods, frequency, etc.) and adversely affect the organization.
- The title is “Releasing GitHub Actions”. Article dated January 31st.
- If you have never used GitHub Actions, I recommend checking the previous article (September 19, 2019), “ Working with GitHub Actions “. This article can be done in hands-on format while reading the explanation, so I will check it by getting my hands dirty.
- Envoy creator and software engineer at Lyft, Matt Klein transcribes a podcast about microservices testing strategies. The podcast is embedded in the lower part of that page, so check it if you like to listen to the main part.
- The title is “Shall We Play a Coordination Game?”
- As the title of the article the author wrote before, “Security as a Product”, a story that examines the relationship between both teams through security as a product and cooperation, through a lens of the concept of moral hazard, a cooperative game of game theory of behavioral economics.
- When I saw the notation of “26 minutes”, which is the estimated time to read, I was very disappointed.
- A three-part blog series that introduces Rego, the OPA (Open Policy Agent) policy language. Click here for Part 2. Click here for Part 3.
- OPA is a theme that I want to dig deeper. This is my homework.
- The title is “Shall “API design: Understanding gRPC, OpenAPI and REST and when to use them”.
- It is considering conforming HTTP and RPC-style APIs through gRPC, OpenAPI, REST. Since my premise and inexperience are insignificant, I will read it again.
- The title is “Creating a minimal OS for containers with LinuxKit and Azure.”
- A story of creating a minimal OS using LinuxKit and Azure environment. He says that “Historically I thought only massive companies such as Canonical or RedHat were capable of building out a Linux distribution, but the LinuxKit tool drastically lowers that barrier to entry”.
- The title is “Creating a Helm repo on Google Cloud.”
- He had the following issues, so he moved his hand to solve them.
- As a platform engineer he wanted new chart versions to be available as quickly as possible across all envs.
- So that HelmReleases didn’t fail on startup because the version does not exist.
- And dependencies on Helm Releases were kept outside of the cluster.
- The title is “Vim Kubernetes YAML Support”.
- When writing a Kubernetes YAML file with vim, you can set it so that completion and explanation of resources are included. A YouTube video with this content is also linked, so please have a look. The default schema is Kubernetes 1.14.0 schemas, but there are instructions on how to update it.
Tools
- GitHub page of “Pomerium”, an IAP (identity-aware proxy) OSS tool that enables secure access to internal applications. Click here for the io page.
SRE Weekly Issue #215 April 20th, 2020
Articles
The “messy” details of our human/computer systems is their hidden strength.
Lorin Hochstein
- Often, it is ideal that the human/computer system is in order, but in reality it is not (e.g., people who are not determined to be on-call duty, help of people who are accidentally called to the Slack channel, etc. ), and argues that mess is a necessary element of the system.
Accident Case Study: Just a Short Flight
In this accident report, learn how two pilots lost situational awareness, with disastrous consequences.
Air Safety Institute
- A case study on the subject of an American plane crash.
- There are many things to do, such as pilot skills, captain supervision, and flight plan meetings. However, since all of them cause obstacles due to nearby factors (lack of situational awareness/lack of skills/lack of recognition alignment/appropriateness), we will take a lesson.
Succeeding With Service Level Objectives
Without a structured strategy, and careful consideration of the full SLO lifecycle, SLOs risk partial implementation. This can result in low ROI and, > in many cases, a complete failure.
Danny Mican — Squadcast
- Auth0’s Senior SRE Danny Mican ‘s article on how to make an SLO from scratch using the IIDARR process. It’s interesting, so I want to read it over carefully.
- IIDARR is taken from the initials of the following elements.
* Identify
* Instrument(Measures)
* Define
* Alert (Action)
* Report/Refine
Back to Basics: Why Global Infrastructure Matters
The cloud’s multiple availability zones and regions can be powerful, but it’s hard to get a multi-region architecture correct.
Serhat Can — OpsGenie
- This article focuses on the “Global Infrastructure”, which is the aspect that promises the reliability of the cloud that I often overlook.
A useful little JavaScript tool: plug in an availability percentage (e.g. 99.99%), and get back the number of minutes you can be down in a day, month, quarter, or year.
Hexadecimal
- “SLA Uptime calculator!” SLA can be instantly calculated on a day/week/month/quarter/year basis, which is convenient!
Hosted Pools Availability Degradation
Azure Pipelines had an incident of delayed builds at the end of March. Find out more in this post-incident analysis.
Chad Kimes — Microsoft
- Azure post-mortem regarding build and release delays for Windows and Linux agents that occurred in EU and UK due to a pandemic (COVID-19) from 3/24 to 3/26.
- I felt my stomach ache when I imagined that it took me about 5 hours to notice something in 10 minutes on the first day due to a communication problem (which seems to be a monitor design problem).
Free Google Book: Building Secure and Reliable Systems
Google published another book in their SRE series. This short summary gives an overview of what’s inside along with an explanation of the motivation for another book. See also: Google’s announcement
Todd Hoff — High Scalability
- The third SRE book “Building Secure And Reliable Systems” by Google has been released online for free! Chapter 21, page 557. I also want to hold a reading session.
One Team at Uber is Moving from Microservices to Macroservices
The pendulum is swinging back, and folks are starting to see the downsides of a plethora of microservices, including early champions, Uber.
Todd Hoff — High Scalability
- Uber Inc. from micro service macro begins talking about the fact that has announced the transition to the service (well-sized services).
- “ Building reliable and testable microservices is a lot harder than most folks think” “It’s a macro service, not a monolith”.”May or may not have/need monorepo”. “Better observability and debugging”. These words that jumped to various suggestions are lined up. It is better to read the context properly to avoid misunderstandings and expanded interpretations.
- Click here for the corresponding tweet.
Outages
- Quibi
Quibi had issues on their launch day. - Deliveroo
- Google Cloud Platform IAM
Click through for their interesting post-incident analysis. - Cloudflare
Here’s their post-incident analysis that details a remote hands request gone awry. - Chef
- Hulu
- Lots of Banks in the US
Banks went down around the time when customers were checking to see if their economic stimulus payments had arrived. - Petnet (smart pet feeder)
- Snapchat
- Fastly
- DoorDash
- StackPath
KubeWeekly #213 April 25th, 2020
The Headlines
Editor’s pick of the highlights from the past week.
Kubernetes Podcast episode 100: Community Redux, with Paris Pittman
To celebrate the 100th episode of the Kubernetes Podcast from Google, hosts Adam Glick and Craig Box welcome back their first ever guest, Paris Pittman. Paris is an open source program manager at Google Cloud, member of the Kubernetes steering committee, and founder of the CNCF Contributor Strategy SIG and the Kubernetes contributor communication committee. Paris looks at how the Kubernetes community has changed and ways in which it has stayed the same, as well as how other projects can adopt learnings from Kubernetes.
- Kubernetes Podcast by Google employees. The current co-hosts are Craig Box and Adam Glick.
- This is the 100th episode!! Paris Pittman, Kubernetes Steering Committee members, Google’s Open Source Program Manager has welcomed as a guest.
- The topics of interest in News of the week are: There are many topics this week, isn’t it? It’s hard to chase but good.
○ New Tanzu announcements
○ Magicpak by Hiromi Ogawa
○ Pluto from Fairwinds
○ Anthos: Under The Hood by the Google Cloud Developer Advocacy team
○ Announcing the Kubernetes Contributor Communications team
○ How to join
○ CFP opens for KubeCon US
Bundle Training and Certification to Jump Start Your Career
Dan Brown, The Linux Foundation
The Linux Foundation offers training and certification bundles, which provide the courses needed to gain the knowledge necessary to succeed in a chosen open source career, and a certification exam to enable you to confidently demonstrate that knowledge to potential employers. Bundles are the more direct way to get qualified for a new open source career, or add new skills to advance your current one.
And there is a sale! 30% off offer for all bundles, courses and certification exams. Use code ANYWHERE30 at checkout. LF also has nearly two dozen completely free training courses that are always available to help you get started and determine the open source technology area in which to focus.
- The campaign up to 4/6 provided by CNCF has been extended to 4/30(at that moment).
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
You can view all CNCF recorded and upcoming webinars here
CNCF Project Webinar: Announcing Open Source gRPC Kotlin
James Ward, Developer Advocate @Google Cloud Platform
- Webinar video of the day after the announcement by James Ward of Google’s Developer Advocate, whose name has risen as a contributor in the article “ gRPC, meet Kotlin “ that was covered by the former blow. Introduction of OSS-based gRPC Kotlin project and how to use it.
- Other contributors also participate and follow each other, and the atmosphere seems to work well as a team.
Sertaç Özercan, Software Engineer @Microsoft and Lachie Evenson, Principal Program Manager @Microsoft
- Webinar video by Microsoft Software Engineer Sertaç Özercan and Microsoft Principal Program Manager Lachie Evenson.
- “Gatekeeper”, a subproject of OPA (Open Policy Agent), is introduced as “a method of ensuring compliance without compromising development agility and operational independence.”
It’s like applying a policy that runs in OPA with an Admission Webhook that you can customize for your Kubernetes cluster.
CNCF Member Webinar: Kubernetes RBAC 101
Oleg Chunikhin, CTO @Kublr
- Kubinr CTO Oleg Chunikhin’s Webinar video explaining the concept and objects of RBAC (Role-Based Access Control) in Kubernetes.
CNCF Member Webinar: 如何让你的Windows应用运行在Kubernetes平台
杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis 张文墨 and Larry Zhang, 解决方案架构师 Solution Architect @Mirantis
- Webinar delivered in Chinese by Alex Yang, Mirantis Solution Architect, and Larry Zhang, the same position at the company. I wonder if there will be a day when Japanese will line up here. I think that if languages other than English are lined up side by side, diversity/base will expand. Even if I improve my English, my native language is easier to remember and express (in my capability of English and other skills).
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
kubesort is a tool that helps you sort the results from kubectl get in an easy way
- The GitHub page of the OSS tool “kubesort” that easily sorts the results of kubectl get.
- For example, you can just type
kubectl --sort-by=.status.containerStatuses[0].restartcount get po
instead ofkubesort status
and to sort the pod status. You don't have to sort the hierarchical representation using json-path. - v0.1.0 supports
kubectl get pod
only. In the roadmap, v0.2.0 will support Deployments, services, namespaces, and auto-completion will be possible.
Videos: Intro to Vitess — its powerful capabilities and how to get started
Abhi Vaidyanatha, PlanetScale
- Introducing YouTube video of Sugu Sougoumarane, PlanetScale’s CTO and Vitess co-creator.
- The video is in FAQ format and answers the following questions:
- How Vitess was tested on Borg at YouTube
- What attributes make Vitess cloud-native and how does it run on Kubernetes?
- What is VReplication?
- What can be done with VReplication?
- Real time rollups explained
Learn from our maintainer: build/run/test your Velero code locally and in cluster
Carlisia Campos, VMware
- A YouTube video by VMware’s maintainers explains the way to handle Velero code locally and in clusters, as titled. Easy to see terminals and procedures.
Multicluster-Scheduler and Argo (Workflows and CD): a Deep Dive
Gokul Chandra
- This article describes Deep Dive, which combines Multicluster-Scheduler , Argo Workflows, Argo CD, Virtual-Kubelet, Cilium, etc., in order to consider configurations that support multi-cluster and multi-cluster environments.
- There are plenty of explanations and screenshots, and the contents are varied, but I would like to try it. Homework.
Virtual 4G Simulation Using Kubernetes And GNS3
Christopher Adigun, Loodse
- Article deploying a virtual 4G stack using Kubernetes and GNS3.
- I’m very grateful to GNS3 in the environment where I used to work, and I’m curious about networks, so I’d like to do this later.
Building a Kubernetes-Based Platform: Progressive Delivery, the Edge, and Observability
Datawire
- The author explains that “Kubernetes has been widely adopted and it provides a solid foundation on which to support the other three capabilities of a cloud native platform that enables full cycle development.
* Continuous Delivery Pipelines
* The Edge Stack
* The Observability Stack
How GKE surge upgrades improve operational efficiency
Tamas Ragoncsa and Kobi Magnezi, Google
- An article that explains how “GKE surge upgrade improved operational efficiency” by Google. The surge upgrade will be enabled by default from 4/20 and existing node pools will also move during the quarter(at that moment).
Rolling Updates and Blue-Green Deployments with Kubernetes and HAProxy
Nick Ramirez, HAPProxy
- Article on rolling update and Blue-Green deployment using “HA Proxy Kubernetes Ingress Controller” by HA Proxy dated 2/11. The execution environment is Minikube.
EKS Service Accounts Explained
Jason Smith
- It describes what he was confused about implementing AWS’s ability to add IAM permissions to pods a few months ago, he helpfully cleared up some of the confusion on what AWS was actually doing, and what he believes they did right and what they did wrong.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Navigating the Kubernetes Hype Cycle
Cornelia Davis and Liz Rice
- Introducing the new podcast “The Art Of Modern Ops” hosted by Cornelia Davis, CTO of Weaveworks and author of “Cloud Native Patterns”.
- Liz Rice, the chairman of Aqua Security VP Open Source Engineering and CNCC’s TOC (Technical Oversight Committee), welcomed as a guest and the theme is “Navigating the Kubernetes hype cycle”. The number of podcasts I want to listen to has increased again. Immediately subscribe.
Kubernetes architecture for beginners
Kevin Casey, Red Hat
- An article that explains “Kubernetes, the basics of Kubernetes architecture and key factors”.
Pancake Podcast: Cassandra and the Kubernetes Data Plane
Joab Jackson, The New Stack
- Podcast has a panel discussion on “What is the role that the data plane plays in a Kubernetes ecosystem?”
GigaOm Radar for Hosted Kubernetes Solutions
Enrico Signoretti, Gigaom
- Report of “Hosted Kubernetes Solutions” by GigaOm.
- Only the open page radar and Summary can be viewed for free.
Is Kubernetes becoming the driving force of enterprise IT?
Graham Berry, RedHat
- The author explained Kubernetes along with the theme “Is Kubernetes becoming the driving force of enterprise IT?”, and concluded that “Ultimately, what do you want your teams to focus on? If the answer is building world-class services for customers and getting them to market faster than ever before, then Kubernetes would be a potent weapon in your armour”.
NetApp to make stateful applications easier to do in Kubernetes
Steven J Vaughan-Nichols, ZDNet
- An introduction to NetApp’s efforts through Project Astra to make it easier to deploy stateful apps on Kubernetes storage and container platforms.
The important things I know which helped me pass the CKAD exam
Vishwas Javalgekar
- An article titled “Important things that helped me to obtain CKAD (Certified Kubernetes Application Developer)” introduces tips for using aliases and deleting resources without Grace period.
Istio Service Mesh in 2020: Envoy In, Control Plane Simplified
Alon Berger, Alcide
- An article that describes Istio’s current trends and updates in 2020.
Master Shifu & His Cloud-Native Mentoring Sessions
Vishal Biyani, Infracloud
- Article introducing “History of InfraCloud” and mentorship program “Talk-To-Us” for students/engineers who are interested in cloud-native technology through “Kung Fu Panda” as a model through characters such as Shifu Roshi.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Member webinar: Kuma: Service Mesh and the Future of Application Connectivity
Marco Palladino, Kong
April 28, 2020 10:00 AM Pacific Time
Member webinar: KubeCarrier: the Operator of Operators
徐嘉诚 Jiacheng Xu, 软件开发工程师 Software Engineer @Loodse
This webinar will be delivered in Chinese.
April 29, 2020 10:00 AM China Standard Time
Member Webinar: Building Zero Trust based Authentication in Healthcare with SPIRE
Bobby Samuels, Vice President, AI Technology @Anthem, Inc.
Frederick Kautz, Head of Edge Infrastructure @Doc.ai
Emiliano Berenbaum, Chief Technologist, HPE Labs @Hewlett Packard Enterprise(HPE)
April 29, 2020 10:00 AM Pacific Time
Member webinar: Best Practices In Implementing Container Image Promotion Pipelines
Baruch Sadogursky, Head of DevOps Advocacy @JFrog
April 30, 2020 10:00 AM Pacific Time
Community webinar: How to Conduct a GREAT Live Stream
Alex Lustenberg, Jorge Castro, Chris Short
April 30, 2020 1:00–3:00pm Pacific Time
Project webinar: Kubernetes 1.18
Kubernetes release team
May 1, 2020 9:00 AM Pacific Time
Member Webinar: How AWS uses Firecracker and Fargate to run serverless Kubernetes pods in Amazon EKS
Mo Ziyuan 莫梓元 解决方案架构师 @AWS
This webinar will be delivered in Chinese.
May 7, 2020 10:00 AM China Standard Time
Member webinar: Data Services for Cloud Native Workloads
Diamanti
May 12, 2020 10:00 AM Pacific Time
Member Webinar: Piraeus: Dynamic Provisioning, Resource Management and High Availability for Local Persistent
Philipp Reisne, CEO @Linbit
Sun Liang, 资深存储架构师 @DaoCloud
Alex Zheng, 资深存储工程师 @DaoCloud
This webinar will be delivered in Chinese.
May 13, 2020 10:00 AM China Standard Time
Member webinar: Cloud Native Monitoring: Scaling Prometheus
Aaron Newcomb, Director, Product Marketing, Monitoring @Sysdig
Carlos Arilla Navarro, Technical Marketing Engineer @Sysdig
May 19, 2020 10:00 AM Pacific Time
Member webinar: Kubernetes Cost Allocation Done Right
Webb Brown, Co-founder and CEO @Kubecost
June 24, 2020 10:00 AM Pacific Time
Member Webinar: Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!