SRE / DevOps / Kubernetes Weekly Collection#43(Week 48)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #517 November 22nd, 2020
SRE Weekly Issue #245 November 22nd, 2020
KubeWeekly #242 ←No Updates
DEVOPS WEEKLY ISSUE #517 November 22nd, 2020
News
- The title is “How could they be so stupid?”.
- An article commenting on an incident that originated from a Twitter account takeover that was previously covered. It seems that the article has not been updated in particular, so I will skip it.
- The title is “DevSecOops — Stories of DevSecOps Failures and Success”.
- A speaker Deck slide that shows trial and error to provide a Happy Path to the developers who are customers of the author through high development productivity.
- The title is “Learnings From Two Years of Kubernetes in Production”.
- Based on the operational experience of Kubernetes, it shares current knowledge and ideas. He said that he started using kops before EKS became GA in AWS Singapore Region.
- The title is “KubeCon 2020 Recap — Maturity in Cloud Native”.
- Due to the time difference and other events, I could hardly participate, so I am grateful for the outline. It was surprising that the number of Certified K8s Distributions decreated. The market situation seems fiercely competitive and some of them integrated.
- The title is “New courses on distributed systems and elliptic curve cryptography”.
- It introduces eight new lecture courses on distributed systems and tutorials on elliptic curve cryptography. I am very grateful for the notes and YouTube videos.
- The title is “Beyond the buzzword: BPF’s unexpected role in Kubernetes”.
- It comes from the presentation of KubeCon NA. I will put the eBPF hands-on day in my calendar, because I skipped too many times to avoid deep diving into it nowadays.
- The title is “11 FACTS ABOUT REAL-WORLD CONTAINER USE”.
- There are many items, but each one is short and easy to read, so you can read it quickly and use it as a reference.
- Kubernetes runs in half of container environments
- Nearly 90 percent of containers are orchestrated
- A majority of Kubernetes workloads are underutilizing CPU and memory
- GKE, AKS, and EKS dominate on their respective cloud platforms
- 1 in 3 AWS container environments runs Fargate
- Larger Kubernetes clusters contain larger nodes
- Networking technologies are prevalent among DaemonSets
- The most popular Kubernetes version is 17 months old
- Organizations are in the early stages of service mesh adoption
- Half of all containers are now managed by cloud provider and third-party registries
- NGINX, Redis, and Postgres are the most popular container images
- The title is “Series: Deploying ASP.NET Core applications to Kubernetes”.
- A series that shares what the author learned when deploying ASP.NET Core apps on Kubernetes. Currently, there are up to Part 12, each of which is written so that the reader can easily see it.
Tools
- A web page of the OSS DNS client CLI tool “dog”. It is colorful and easy to see.
- The GitHub page of “illuminatio”, a tool that automatically tests Kubernetes’ Network Policy.
- The GitHub page of Karpenter, a metric-driven autoscaler built for Kubernetes that can be run anywhere in any Kubernetes cluster. Still in the developer preview stage. I feel like I have covered it before, but I cannot find it.
SRE Weekly Issue #245 November 22nd, 2020
Articles
Trust Asia 2021 has produced inconsistent STHs
A Certificate Transparency (CT) log failed, resulting in its permanent retirement. The incident involved unintended effects from load testing being performed in a staging environment. I have a huge amount of admiration and respect for the transparency of certification authorities (CAs) when things go wrong.
Trust Asia
- The communication of Google group “Certificate Transparency Policy” that shows the flow of investigation results and future countermeasures from inquiries when a CA Trust Asia failure occurs.
- The root cause was that the machine that the test cluster used for elastic scaling was incorrectly connected to the production cluster because the test cluster incorrectly used the ETCD address and test machine in the production environment.
Knowing your systems and how they can fail: Twilio and AWS talk at Chaos Conf 2020
I like the idea that adding the ability to fail over to your system makes it much more complicated and thus more likely to fail.
Andre Newman — Gremlin
- Two presentations at Chaos Conf 2020 are taken up and explained. The presentations are embedded in the web page.
- Both presentations provide answers to the following important questions:
- What are we aiming to accomplish with Chaos Engineering, and how do we do it thoughtfully?
Building for reliability at HelloSign
This one introduces some interesting concepts: the error kernel and property testing.
Kenneth Cross — HelloSign
- The products by HelloSign and the concept are introduced according to the following items.
- Kernel panic!
- A brief guide on how to save the day
- How we use property testing
- Conclusion
Tech Startup Dilemmas: Resilient Deployment vs. Exhaustive Tests
[…] to be resilient, we must test everything, which consumes time that we don’t spend innovating. A good trade-off is to test in production.
Xavier Grand — Algolia
- The title explains the need to find the right balance between resilience and innovation in order to respond to market changes and new needs.
- The above perspective excerpted from the text was not in me, “A good trade-off is to test in production.”.
8 Tips to Create an Accurate and Helpful Post-Mortem Incident Report
More useful tips as you develop your post-incident analysis process. I like their definition of “blameless”.
Zachary Flower — Splunk
- As the title suggests, the following eight tips are explained.
- Don’t assign blame
- Do take responsibility
- Don’t procrastinate
- Do gather information
- Don’t be vague
- Do define clear owners
- Don’t lose focus
- Do use a consistent template
Achieving exactly-once message processing with Ably
Exactly once delivery is hard to implement and requires explicit coordination at all levels, including the client. Ably explains how their flavor works.
Paddy Byers — Ably
- An article that clarifies the meaning of “exactly-once” in the context of distributed pub/sub systems and aims to understand the meaning of “exactly-once” guarantees provided by Ably.
- In line with the theme, it digs deeper from the semantic type of messaging.
Why you should frequently turn down ~30% of canary instances
The most effective (if scary) way to understand how your stateless service operates under load
Utsav Shah — Software at Scale
- It explains possible approaches when it is unclear how to understand the limitations of service scalability.
- General approach: Start with a synthetic load via script
- Utilization DRT(Disaster Recovery Test)
The Engineer’s Guide to Preparing for Black Friday 2020
Some good tips here — and a reminder that we may see even more traffic than normal due to social distancing.
- Regarding Black Friday at the beginning, it is dangerous for people to flock to the store in the presence of COVID-19, and since it has become a digital event in the past few years, the movement is expected to accelerate this year as well.
- It explains the following points on how to handle a Black Friday that’s unlike any other we’ve seen thus far.
- How SLO-based alerts, runbooks, and other practices that drive preparation are critical to the success of the holiday season
Outages
- ASX (Australian Stock Exchange)
- Coinbase
- GoDaddy
GoDaddy’s statement took care to explicitly state that the outage was not a security incident. This may be because they appear to have had an unrelated security incident around the same time, and some customer domains were taken over. - Nest
KubeWeekly #242 ←No Updates
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!