SRE / DevOps / Kubernetes Weekly Collection#90(Week 42, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links. (KubeWeekly is off this week).
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #564 October 17th, 2021
SRE Weekly Issue #292 October 17th, 2021
KubeWeekly # 281 October 29th, 2021 ← 2 weeks off due to KubeCon + CloudNativeCon NA 2021
DEVOPS WEEKLY ISSUE #564 October 17th, 2021
News
- The title is “FAILING FASTER WITH TERRAFORM”.
- As commented by the Editor above, Terraform validation is introduced from the perspective of a Terraform beginner.
- The title is “Operations is not Developer IT”.
- An article that vividly describes the heavy loads on the operations team by adopting DevOps, Docker, Kubernetes, and various vendor tools.
- The title is “Worst Case”.
- As commented by the Editor above, it is interesting to do a thought experiment if various incidents occur in the us-east-1 region, which is a region that has a great influence on AWS.
- The title is “The long-term consequences of maintainers’ actions”.
- The good news is that OpenSSL 3 has entered into Alpine, but it also has a down side. It explains the package dependencies that can be a pitfall.
- The title is “Your Terraform Module Needs an Opinion”.
- The author, who has a strong opinion about what the Terraform module should be, explains that you should not make a Swiss Army knife, you should not make a complicated wrapper, and so on.
- The title is “Terraform Module Patterns”.
- Another Terraform module article. This explains the applicable modules, so it seems good to read it together with the previous article.
- The title is “10 TRENDS IN REAL-WORLD CONTAINER USE”.
- As the title suggests, it explains the following 10 trends. It is expressed in an easy-to-read manner as Datadog, and each is interesting.
- Nearly 90 percent of Kubernetes users leverage cloud-managed services
- Amazon ECS users are shifting to AWS Fargate
- The average number of pods per organization has doubled
- Host density is 3 times higher on Kubernetes than on Amazon ECS
- Pod auto-scaling is becoming more popular
- Organizations are deploying more stateful workloads on containers
- Organizations running container environments create more monitors
- Organizations are starting to replace Docker with containerd as their preferred runtime for Kubernetes
- OpenShift adoption is growing rapidly
- NGINX, Redis, and Postgres are the top three container images
- The title is “Kubernetes Co-founder Joe Beda:” Software development is a team sport “”.
- An Interview with many interesting stories such as Internet Explorer in the Microsoft, Kubernetes, and work-life balance.
Tools
- The GitHub page of “Kui”, a framework that enhances CLI with graphics.
- The GitHub page of Panther, an event integration and management app that centralizes and manages events from IT systems, networks, and applications from a single console.
- As the Editor mentions above, the GitHub page of a simple, modern and secure file encryption tool, format, and Go library “age”.
- The GitHub page of “Kdigger” (short for Kubernetes digger), a context detection tool for Kubernetes penetration testing.
- Click here for the introductory blog.
SRE Weekly Issue #292 October 17th, 2021
Articles
Four lessons every company should learn from the back-to-back Facebook outages
The lessons:
1. Acknowledge human error as a given and aim to compensate for it
2. Conduct blameless post-mortems
3. Avoid the “deadly embrace”
4. Favor decentralized IT architectures
There have been quite a few of these “lessons learned” articles that I’ve passed over, but I feel like this one is worth reading.
Anurag Gupta — Shoreline.io
Niall Murphy
- As you can see from the title above and the comments from the Editor, four lessons learned from Facebook’s recent outage are extracted and explained.
- It is important to have a culture and organizational atmosphere where you can have the following conversations by looking back on outages.
- “We’ve already paid for this outage. What benefit can we get from that expenditure?”
Could us-east-1 go away? What might you do about it? Let’s catastrophize!
I love catastrophizing!
Tim Bray
- Since it is covered in DEVOPS WEEKLY ISSUE # 564 above, I will skip it.
What Managed Kubernetes Service is Best for SREs?
When evaluating options, this article focuses on reliability, both of the service itself and the options it provides for building reliable services on it.
Quentin Rousseau — Rootly
This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.
- It explores the five most popular Kubernetes services (Amazon EKS, Azure AKS, Google Cloud GKE, SUSE Rancher, Red Hat OpenShift) and gives an overview of how they stack up on the reliability engineering forefront.
This one answers the questions: what are failure domains, and how can we structure them to improve reliability?
brandon willett
- The first article in a short series named “SRE Toolkit”. Each entry being a friendly introduction to one concept the author has consistently found useful in its quest to make software sturdier.
SRE top interview questions to land an SRE role
It’s a great list of questions, and it covers a lot of ground. SREs wear many hats.
Opsera
- It’s a good list to prepare for an SRE role, and I think it can be used by interviewers and when you want to quickly check your understanding with your current SRE skillsets.
How Time Series Databases Work — and Where They Don’t
I’ve always been curious about how Prometheus and similar time-series DBs compress metric data. Now I know!
Alex Vondrak — Honeycomb
- It details the topic of Time Series Databases (TSDB) and why Honeycomb couldn’t be limited in the implementation of TSDB.
An UPDATE without a WHERE, or something close to it
This one has some unconfirmed (but totally plausible!) deeper details about what might have gone wrong in the Facebook outage, sourced from rumors.
rachelbythebay
- There is another rumor about the cause of Facebook’s outage that is different from the rumors it covered last week, so it has picked it up and explained.
Turning Safety vs. Profits Into a Fair Fight
There’s a really intriguing discussion here about why organizations might justify a choice of profit at the expense of safety, and how the deck is stacked.
Rob Poston
- The content of the title explains the steps and issues to improve safety from the perspective of “Why does a powerful concept such as HRO (high reliability organization) fail to spread in hospitals?”
KubeWeekly # 281 October 29th, 2021 ← 2 weeks off due to KubeCon + CloudNativeCon NA 2021
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!