SRE / DevOps / Kubernetes Weekly Collection#86(Week 38, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #560 September 19th, 2021
SRE Weekly Issue #288 September 19th, 2021
KubeWeekly #278 September 24th, 2021
DEVOPS WEEKLY ISSUE #560 September 19th, 2021
News
- The title is “Terraform is Not the Golden Hammer”.
- An article that looks back on its company’s experience, explaining where, when, and how to use Terraform. It is explained in the following points.
○ How we used Terraform
○ Problems facing
○ Advises and suggestion
○ Conclusion
- The title is “Infrastructure as SQL”.
- The title and the above Editor’s comments are explained in the following points.
○ Relations and Types Matter for Infrastructure
○ New Powers: Explore, Query, and Automate Your Infrastructure
○ You Don’t Need to Learn a New API (Probably)
○ You Can Test, Too
○ Recover With Ease
A discussion of the role of SREs in enabling true self service platforms and empowering developers.
- The title is “The Developer Experience and the Role of the SRE Are Changing, Here’s How”.
- In the Conclusion, what it wants to tell is, “Developers should take the opportunity to share their pain points and also learn about tooling and best practices from SRE teams, with the goal of “paving the path” to developer autonomy, self-service, and full service ownership.” explained at the following points.
○ Two worlds colliding: The monolith and service-oriented architecture
○ Enabling developers to own the full application lifecycle
○ Understand the changing developer experience to support developer ownership
○ Conclusion: Developers should work with SREs as collaborators, not first responders
- The title is “Practical API Design at Netflix, Part 1: Using Protobuf Field Mask”.
- Part 1 of the series post. It explains how and why Netflix Studio Engineering is used for APIs to read data.
- Part 2 will explain how to use FieldMask for update and delete operations.
- The title is “Configuration as Data, GitOps, and Controllers: it’s not simple for multi-cluster”.
- The contents of the title are explained at the following points using figures with a handwritten taste.
○ A basic example of declarative configuration and controllers
○ Extreme examples
○ Case study: multi-cluster GitOps with Istio
○ Federating a service mesh has unique challenges
○ Takeaways
A deep dive into Kubernetes ingress, with helpful diagrams showing how things work.
- As mentioned above, “Kubernetes Ingress” is explained with a deep dive.
Tools
- The GitHub page of “kim (Kubernetes Image Manager)” which is a CLI for Kubernetes. Images can be built locally on the k3s cluster.
- As stated in “STATUS: EXPERIMENT — Let us know what you think”, it is still in the experimental stage.
- The GitHub page of “Kratix”, a framework for providing a platform.
- The GitHub page for BMC(BPF Memory Cache), the in-kernel cache for memcached.
- The GitHub page of the CLI app “kink” that makes it easy to run KinD clusters on Kubernetes pods and manages the entire life cycle of these clusters, including listing and deleting clusters.
SRE Weekly Issue #288 September 19th, 2021
Articles
Tammy Bryant Butow on SRE Apprentices
Faced with a difficult hiring market for SREs, they embarked on a well-designed, carefully thought out program to hire and train entry-level folks as SREs — and it worked!
Thomas Betts — InfoQ
- It discusses the theme of training for new SREs.
- Key Takeaways are below.
○ Hiring new site reliability engineers can be challenging. Dropbox decided to create a program to teach a cohort of students the skills necessary to be successful SREs.
○ A non-traditional approach to find engineers will naturally lead to a more diverse set of applicants. Bringing in people with different backgrounds can lead to new ways of looking at common problems.
○ Training should start with small tasks, letting the engineer learn by doing. Gradually these build from one-day tasks to longer, one-week, or one-month projects.
○ If your company creates a formal training program, it needs to be communicated to everyone, so there is understanding and proper expectations when the apprentices work with other employees.
○ In any new role, there is a need for understanding how to communicate with other people. Inviting junior employees to meetings allows them to see how senior members of the team interact to solve problems.
The things we find hardest in incident response
No matter how good your tooling is, how experienced you are, or how much you’ve prepared, incidents can still be hard.
Five people share about what they find hardest during incident response.
Chris Evans — incident.io
- According to the content of the title, 5 people each commented on the following points. Each keyword is highlighted.
○ Working out the most highly leveraged role to play
○ Getting up to speed without disrupting the flow
○ Making decisions quickly as an individual vs context sharing and consensus
○ Keeping track of threads (virtual, not Slack)
○ Striking a balance between trusting your gut and systematically gathering evidence
○ Recovering from bad assumptions
The Developer Experience and the Role of the SRE Are Changing, Here’s How
This one has a lot of ideas about how to guide developers toward full ownership of their services in production.
Ambassador
- Since it is covered in DEVOPS WEEKLY ISSUE # 559 above, I will skip it.
In this post, I will cover the following modes of system resilience:
* Adaptive Response
* Superior Monitoring
* Coordinated Resilience
* Heterogeneous Systems
* Dynamic Repositioning
* Requisite Availability
Ash P — Cruform
- At the beginning, the definition of system resilience is confirmed, and the above six models are explained.
Useful knowledge and improvisation
Root cause of success: unpatched security vulnerability
TMW a security vulnerability allows you to break into your infrastructure, averting disaster during an incident.
Lorin Hochstein, with incident story by Eric Dobbs
- It considers two elements in the title that play an important role in incident response.
Heroku Incident #2347 Follow-Up
A migration didn’t go as planned, and customer traffic lost its way.
Heroku
- Follow-up information on the above Heroku incidents that occurred between 2021–08–24 00:00 UTC and 2021–08–26 19:10 UTC.
Transforming DevOps with Human-in-the-Loop Automation
I’m a big believer in human-in-the-loop automation. My favorite part of this article was this:
A further problem is that full automation — which aims to take the human out of the picture — requires a complete, nuanced understanding of a system and all potential outcomes, paradoxically resulting in heightened system complexity.
Tina Huang — Transposit
- It is explained in the following points from the viewpoint of the title.
○ Debunking the myth of ‘automate everything’
○ Keeping humans in the loop is critical for effective automation
○ Human-in-the-loop automation in action
Outages
For some users, Assembled’s styling was not rendering and caused the application to be unusable.
“Root cause”: CSS
- Apple Store
- United Airlines
- TikTok
- Slack
- GCash
- Solana (Cryptocurrency)
They posted details in later tweets::
* thread 1
* thread 2
KubeWeekly #278 September 24th, 2021
The Headlines
Editor’s pick of the highlights from the past week.
What to expect from KubeCon + CloudNativeCon North America 2021
Adrian Bridgwater, Computer Weekly
Adrian Bridgwater of Computer Weekly outlines what to expect from KubeCon + CloudNativeCon North America 2021 happening October 11–15 in Los Angeles or virtually from anywhere in the world. Learn more about the 200+ sessions, 17 co-located events, and activities. Hope to see you there!
- An introductory article for KubeCon + CloudNativeCon North America 2021.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Introduction to APIClarity — A Wireshark for APIs
Zohar Kaufman & Alexei Kravtsov, Cisco
- An approximately 42-minute session explaining “API Clarity”, a new open source tool that acts as Wireshark.
- The Webinar agenda and Key Discussion Points are below.
○ Understanding the need for, and benefits of, open API specification reconstruction
○ A survey of existing open source solutions for open API specification reconstruction
○ An API Clarity demo
○ Potential use cases of APIClarity for API security
Optimizing and securing Kubernetes workloads with Polaris and Goldilocks
Andy Suderman, Fairwinds
- An approximately 55-minute session that demonstrates how to use the open source tools Polaris and Goldilocks to scan Kubernetes workloads to improve resource utilization and security.
Kong Ingress Controller — Kubernetes Ingress is a steroids
Viktor Gamov, Kong
- An approximately 45-minute session that explains how to enable security declaratively, API rate limiting, and how to add native gRPC support.
Enable stateful applications on AWS with persistent storage for Kubernetes
Ananth Vaidyanathan, AWS
- An approximately 25 minutes of sessions discussing different use cases, architectural techniques, and best practices for sharing and persisting data between K8s clusters using Amazon EFS serverless storage.
Operationalizing 300+ K8 clusters across the cloud
Niraj Amin, Rajarajan Pudupatti SJ, & David Botelho, Fidelity
- An approximately one-hour session explaining the challenges faced by the platform team during their journey and the approaches adopted to solve them.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
IAM roles for Kubernetes service accounts — deep dive
Maciej Jarosiewicz
- It shows you the nuts and bolts of how IAM and Kubernetes work together in harmony to provide you with a great experience of calling AWS services from your pods with no hussle with the following points.
○ Introduction
○ IAM doesn’t trust service accounts, do you?
○ Let’s jot it down
○ Issues on top of issues
○ Federated identities
○ Swap That Swiftly
○ Making this work in your cluster
○ OIDC Identity Provider setup
○ IAM role setup
○ Off the hook
○ Summing up
StackRox office hours (E3): Kubernetes network policies
Mandar Darwatkar and Chris Short, Red Hat
- An approximately 65-minute session that starts with simple and practical steps to protect Kubernetes and then answers live questions.
KubeMQ is now available under open source license
KubeMQ
- The KubeMQ web page that introduces the community version of “KubeMQ” is now available as an open source project.
- The community version supports all messaging patterns, connectors, bridges, can be deployed anywhere, and can run in production. Click here for the Github page.
APM with Prometheus and Grafana on Kubernetes Ingress
Joseph Caudle, Kong
- It explains how running a Kubernetes environment using the open source Kong Ingress Controller can simplify the seemingly difficult task of deploying a full application performance monitoring (APM) stack.
- A YouTube video of about 15 minutes is also embedded.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
New Google cloud deploy automates deploys to GKE
Victor Szalvay and S. Bogdan, Google Cloud
- It introduces the release of Google Cloud Deploy, a managed, opinionated, continuous delivery service that makes continuous delivery to GKE easier, faster, and more reliable. About two and a half minutes of YouTube video is embedded.
Top open source CI/CD tools for Kubernetes to know
Michael Foster & Ajmal Kohgadai, Red Hat
- Here’s a list of CI / CD tools you should know about in a Kubernetes environment, in no particular order. The following are covered, providing information on PROS/CONS/RESOURCES respectively.
○ Tekton
○ Argo Project
○ GitHub Actions
○ Jenkins X
○ OpenShift Pipelines
○ Spinnaker
○ CircleCI
○ GitLab
Ask an OpenShift admin (Ep 44): Kubernetes API deprecations
Andrew Sullivan, Chris Short, Rob Szumski, Camila Macedo, & Frederic Giloux, Red Hat
- As a change in Kubernetes v1.22, some APIs that were previously marked as deprecated have been removed and they’ve delved into the details to cover the point where they’re no longer available. An approximately 65 minutes of session explaining the steps required to prevent the API version from being removed and upgrade to the new API.
Macquarie Bank looks to break free of IaaS
Ry Crozier, iTnews
- An article based on Macquarie Bank’s announcement at the Google cloud summit. The company plans to move to the “No Ops” model to manage the public cloud, which will ultimately be the home of all systems.
Bug Bash presented by CNCF + Sonatype
CNCF
- An event registration page for the above titles scheduled to be held between October 13, 2021 8:00 to October 14, 2021 at 18:00 (PDT). If you are interested, you can register.
Upcoming CNCF Online Programs
Live Webinar
- September 28 at 10am PT: Kanister — Application level data operations on Kubernetes presented by Michael Cade & Pavan Devaraj, Kasten by Veeam — RSVP
Cloud Native Live
- September 29 at 9am PT: Trace-based testing with OpenTelemetry presented by Michael Haberman, Aspecto — RSVP
On-demand Webinars
- September 30: Shifting security left-simplifying security for K8s & OpenShift environments presented by Jody Hunt, CyberArk — RSVP
- September 30: Redefining cloud native debugging presented by Not Goldman, Rookout — RSVP
- September 30: OpenEBS 3.0: What’s in it? presented by Kiran Mova, MayaData — RSVP
- September 30: The thing about your software supply chain… presented by Eylam Milner, Argon Security — RSVP
Looking for more great curated content? Visit our Online Programs playlist on YouTube.
Learn more about CNCF Online Programs
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!