SRE / DevOps / Kubernetes Weekly Collection#83(Week 35, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #557 August 29th, 2021
SRE Weekly Issue #285 August 29th, 2021
KubeWeekly #275 September 3rd, 2021
DEVOPS WEEKLY ISSUE #557 August 29th, 2021
News
- The title is “HashiCorp State of Cloud Strategy Survey: Welcome to the Multi-Cloud Era”.
- As mentioned above, it shares the results of the HashiCorp survey of cloud adoption with numbers and diagrams.
- The title is “Observable Infrastructure as Code”.
- It explains how to use Pulumi and Honeycomb to simplify the observability of IaC.
- The title is “TLDs — Putting the ‘.fun’ in the top of the DNS”.
- It digs deeper into DNS top-level domains.
- The title is “Email Authenticity 101: DKIM, DMARC, and SPF”.
- It explains the elements in the title and provides information and practices you need to keep your domain’s email authentic and less vulnerable to spoofing.
- The title is “Improving TOFU With Transparency”.
- It explains when TOFU (Trust-On-First-Use) works and when it does not work, and mitigation measures using transparency logs.
- The title is “JSON Schema bundling finally formalised”.
- The following points are explained along with the title.
○ Bundling has renewed importance
○ Existing solutions? New solutions!
○ Bundling fundamentals
○ Bundling Simple External Resources
○ OpenAPI Specification Example
○ But what about…
- The title is “Computers are the easy part”.
- At the beginning, it mentioned the accident “Controlled Flight Into Terrain (CFIT)” in which an aircraft under the complete control of the pilot was unintentionally operated on the ground without any mechanical failure in the world of aircraft safety. They shared the lessons learned from an internal outage that lasted for multiple days.
- The title is “You Do the Math: Reliability Issues Triggered by Math Errors”.
- Along the title, it describes, at least in part, the following four incidents or issues caused by mathematical errors.
- NASA’s $ 125 million math mistake
- Windows Calculator fails to calculate
- The math bug that cost Intel $475 million
- Y2K: The math bug that (mostly) wasn’t
Events
- As mentioned above, the web page of the new virtual event “PREVAIL 2021” to be held by IBM during October 19–21, 2021.
Tools
- The GitHub page of the CLI tool “Octopilot” that helps automate GitOps workflows by automatically creating and merging GitHub pull requests and updating specific content in the Git repository.
- Click here for an introductory article.
SRE Weekly Issue #285 August 29th, 2021
Articles
What’s so great about this incident write-up is the way that entrenched mental models hampered the incident response. There’s so much to learn here.
Ray Ashman — Mailchimp
- Since it is covered in DEVOPS WEEKLY ISSUE #557 above, I will skip it.
The parallels between this and the Mailchimp article are striking.
Will Gallego
- The following points are explained along with the title.
○ Akin to Root Cause
○ When do we decide what’s best?
○ Best Practices lack flexibility
○ Best Practice: Don’t use “Best Practice”..?
How to Improve Upon Google’s Four Golden Signals of Monitoring
This includes a review of the four golden signals and presents three areas to go further.
JJ Tang — Rootly
- Since it is covered in DEVOPS WEEKLY ISSUE #556 last week, I will skip it.
Root cause of failure, root cause of success
This one thoughtfully discusses why “root cause” is a flawed concept, approaching the idea from multiple directions.
Lorin Hochstein
- Since it is covered in DEVOPS WEEKLY ISSUE #556 last week, I will skip it.
IBM PREVAIL Conference: October 19–21, 2021
Check it out, a new SRE conference! This one’s virtual and the CFP is open until October 1.
Robert Barron — IBM
- An introductory article on the “IBM PREVAIL Conference” featured in DEVOPS WEEKLY ISSUE #557 above.
- In the above article, the editor commented “The call for papers is open until 10th of September.”, But there is a description of “Submission deadline: October 1, 2021”. Was it updated?
Notes on the Perfidy of Dashboards
To be clear, this article is about static dashboards that just contain pre-set graphs of specific metrics.
every dashboard is an answer to some long-forgotten question
Charity Majors
- As the title suggests, it digs deep into the points to note and where to use (static) dashboards.
○ STATIC VS DYNAMIC DASHBOARDS
○ DEBUGGING WITH DASHBOARDS: IT’S A TRAP
○ IF WE DID MATH LIKE WE DO DASHBOARDS
○ THE LIMITATIONS OF METRICS AND DASHBOARDS
○ OTHER COMPLAINTS ABOUT DASHBOARDS:
○ IN CONCLUSION
What makes public posts about incidents different from analysis write-ups
Public incident posts give us useful insight into how companies analyze their incidents, but it’s important to remember that they’re almost never the same as internal incident write-ups.
John Allspaw — Adaptive Capacity Labs
- As the title suggests, it explains why public posts published by companies about incidents differ from internal incident write-ups that represent effective incident analysis, and why this difference is important.
Heroku Incident #2300 Follow-Up
In this incident from July 7, front-line routing hosts exceeded their file descriptor limits, causing requests to be delayed and dropped.
Heroku
- As mentioned above, a follow-up article on the incident that occurred on Heroku’s 2021/07/07.
TLDs — Putting the ‘.fun’ in the top of the DNS
.io, assigned to the British Indian Ocean Territory is almost exclusively used by annoying startups for content completely unrelated to the islands.
Remember, it’s all fun and games until the random country you’ve attached your business to has an outage in their TLD DNS infrastructure.
Jan Schaumann
- Since it is covered in DEVOPS WEEKLY ISSUE #557 above, I will skip it.
Why Observability Requires a Distributed Column Store
If you’re curious about just what a columnar data store is like I was, this article is a good introduction.
Alex Vondrak — Honeycomb
- It explains what a Distributed Column Store is, its capabilities, and why a Distributed Column Store is a fundamental requirement for achieving observability.
KubeWeekly #275 September 3rd, 2021
The Headlines
Editor’s pick of the highlights from the past week.
From incubation to augmentation, how software projects grow
- The content of the title is explained using the CNCF OpenTelemetry project as a theme.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Daniel Prizmant, Palo Alto Networks
- An approximately 33 minutes of session sharing threat investigations on Windows containers for cloud native apps.
Composing your way to a control plane powered future
Dan Mangum, Upbound
- An approximately 55-minute session that explains how to define your own cloud API just by writing YAML using Crossplane’s Composition.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
Enable seccomp for all workloads with a new v1.22 alpha feature
Sascha Grunert, Red Hat
- As the title suggests, it explains the new features of Kubernetes introduced in v1.22.
Managing Kubernetes seccomp profiles with security profiles operator
- The content of the title is explained with the following points.
○ Security Profile Operator features
○ Installation
○ Creating a seccomp profile
○ Using a seccomp profile
○ Profile inheritance using base profiles
○ Using profile bindings
○ Recording Profiles
○ Metrics and Log enrichment
○ Wrap
Shipwright — building container images in Kubernetes
Viktor Farcic
- A 21-minute video explaining the extensible framework “Shipwright” that builds container images with Kubernetes.
Distributed tracing with Knative, OpenTelemetry and Jaeger
Ben Moss, VMware
- It explains how to set up distributed tracing using Knative Eventing, how it can help you better understand your program, and how it works internally.
A Kubernetes engineer’s guide to mTLS
William Morgan, Buoyant
- It explains what mTLS is, how it relates to “normal” TLS, why it is related to Kubernetes, and the strengths and weaknesses of mTLS and its alternatives. It also shows how to use Linkerd to add mTLS to a Kubernetes cluster.
Service Mesh 101: The role of Envoy
Scott Lowe, Kong
It describes what is a service mesh, what it does, and where Envoy fits into the service mesh. If you want to see more detailed content that focuses on the basics of Envoy configuration on Service Meshes, see the following article, “Service Mesh 102: Envoy Configuration”.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Updates on Google’s continued collaboration with NIST to secure the software supply chain
Eric Brewer and Dan Lorenc, Google
- A Report on participation and announcement of Google’s White House Cybersecurity Summit hosted by President Biden.
- It said that they will collaborate with the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) to support and develop a new framework that will help to improve the security and integrity of the technology supply chain.
- They committed to invest $10 billion over the next five years to expand zero-trust programs, help secure the software supply chain, and enhance open-source security.
- A lot of information is introduced with links, so I would like to read each one.
Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. This time the Host is Craig Box and Guest Host Jimmy Moore.
- The guest is Daniel Megyesi, the maintainer and DevOps engineer of Unicron, the central platform for big data and machine learning at Adevinta. Below is an article introducing Unicron by him.
○ Introducing Unicron, our big data and Machine Learning platform - The topics I was interested in in the News of the week are as follows.
○ Google commits $10 billion to advance cybersecurity
○ ingress-nginx 1.0.0
○ VMware announces Tanzu Application Platform
How FinOps changed the way businesses approach the cloud
- A Guest post to CNCF. The original article is published in the “Virtasant blog” with the same title.
- FinOps keywords and reference materials such as “FinOps Foundation” and “State of FinOps Report 2021” are linked, so it seems to be a good introduction.
Docker is updating and extending our product subscriptions
- An article about the expansion of product subscriptions for large-scale and commercial use of Docker Desktop.
- The conditions for commercial use are as follows. Personal use remains free. It is clearly stated as free components in the article, so check the article if you are uncertain.
○ Organizations with more than 250 employees or $ 10 M/year sales will need a paid subscription, at the latest by the end of January 2022 next year - The application period starts from 2021/08/31. The grace period is until the end of January 2022 next year.
A guide to spot-readiness in Kubernetes
Michael Dresser & Alex Thilen, Kubecost blog
- As the title suggests, as a guide for utilizing Spot Instances in the Kubernetes environment, the necessity and “Kubecost’s Spot-Readiness Checklist” and “Kubecost” are explained with the following points.
○ What are spot instances and why use them?
○ What’s the customer challenge today?
○ Enter… Kubecost’s Spot-Readiness Checklist
○ Implement spot nodes in your cluster using Kubecost, for free!
Daniel Holbach, Flux
- September update for Flux. As a recapping for August, it is explained in the following items.
○ Flux Project Facts
○ News in the Flux family
○ Upcoming events
○ In other news
○ Over and out
How Istio, Tempo, and Loki speed up debugging for microservices
Antonio Berben, Solo.io
- On the Grafana Labs blog, It introduces the tools in the title, Hands-on for them and Grafana Cloud, from the perspective of “Having a diagram which displays all elements involved in a request through microservices increases the speed to find bugs or to understand what happened in your system when running a postmortem analysis.”.
Why cloud native open source is critical for Twitter and Spotify
Alex Williams and B. Cameron Gain, The New Stack
- An approximately 31-minute podcast sponsored by CNCF and an overview article. It is interesting to hear the technical design process and efforts of both companies along the title.
Upcoming CNCF Online Programs
Live Webinar
- September 7 at 10am PT: Kubernetes 1.22 release presented by Savitha Raghunathan, James Laverack & Jesse Butler, Kubernetes 1.22 Release Team — RSVP
Cloud Native Live
- September 8 at 9am PT: Kubernetes clusters need persistent data presented by Alex Chircop, StorageOS — RSVP
Looking for more great curated content? Visit our Online Programs playlist on YouTube.
Learn more about CNCF Online Program
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!