SRE / DevOps / Kubernetes Weekly Collection#50(Week 2, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #524 January 10th, 2021
SRE Weekly Issue #252 January 10th, 2021
KubeWeekly #246 January 15th, 2021
DEVOPS WEEKLY ISSUE #524 January 10th, 2021
News
- The title is “Software is drowning the world”.
- One of the many advantages the author has gained from working in many organizations is that he can understand the commonalities, and he explains from the following perspectives on the subject of “technical debt.”
○ “Every time you decide to solve a problem with code, you are committing part of your future capacity to maintaining and operating that code. Software is never done.”
- The title is “Campaigns”.
- The author proposes a tool called Campaign.
- To coordinate groups of many people, hold groups accountable, and ultimately succeed in paying off technical debt, making architectural changes, improving the customer experience, reducing costs, and more. As a tool / framework that can be used for, Campaign is explained that the followings are required.
○ A Goal
○ Metrics toward that goal
○ Buy-in
○ Method of Accountability
○ A “Window”
○ A Target Date
- The title is “AWS as a Framework”.
- It explains according to the title from the following viewpoints. It aims to justify both the AWS framework and its unique potential when it’s fully utilized.
○ AWS doesn’t sound like an “infrastructure” provider anymore, not even a “platform” provider. It sounds like a framework!
- The web page of “FOSDEM (Free Open Source Developers’ European Meeting)”, a two-day online event sponsored by volunteers to promote the spread of free open source software. The above link is an introduction of each track.
- Click here for “the software composition devroom” that the Editor is interested in.
○ It is usually held in Brussels (Belgium) and says “FOSDEM is widely recognized as the best such conference in Europe.”
- The title is “Coding in Perl? What support do you need?”.
- They are conducting a survey that takes only a few minutes to investigate what they want or need to support engineers looking to move to Perl or progress within Perl.
- This survey will be conducted throughout January and the results will be announced at FOSDEM mentioned above.
- The title is “CI / CD Workflow for AWS ECS via Terragrunt and GitHub Actions”.
- The content of the title is explained in the following flow in different colors so that the figures and codes are easy to see.
○ Initial Setup
○ Workflow via GitHub Flow
○ Configure Infrastructure and Deployment Targets
○ Configure Container Environment and Secrets
○ Integration via GitHub Actions — Pytest
○ Deployment via GitHub Actions — Terragrunt
○ Conclusion
- As mentioned above, the GitHub page of sketch notes covers the main services of GCP. There is “Next 2020 Summary Announcements” as Topic, and it would be nice to have a summary of the services announced at such an event.
A good reading list for anyone moving into more management roles in software.
- The title is “Recommended Engineering Management Books”.
- It introduces a book from the author who has been an engineering manager for the past three and a half years.
- It Introduces a carefully selected list of books that helped, influenced / impacted themselves in “Professional software engineers for over 10 years, a whole new challenge, the process of growing as an engineering manager” and highly recommended to engineering managers.
- Below is a list of books. You can explain the good points of each with actual experience.
○ The Manager’s Path: A Guide for Tech Leaders Navigating Growth & Change by Camille Fournier
○ Thanks for the Feedback by Douglas Stone & Sheila Heen
○ The Hard Thing About Hard Things: Building a Business When There are No Easy Answers by Ben Horowitz
○ Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim
○ Dare to Lead: Brave Work. Tough Conversations. Whole Hearts. by Brene Brown
○ Switch: How to Change Things When Change is Hard
○ Atomic Habits: An Easy & Proven Way to Build Good Habits by James Clear
- The title is “Run Kubernetes Production Environment on EC2 Spot Instances With Zero Downtime: A Complete Guide”.
- I will skip it because it was covered in Kube Weekly # 245 last week .
Events
- It introduced Webinar with the theme of “Alert Fatigue”.
- Since it was scheduled on 1/14 (Thursday) 11:00 CET (Central European Time zone).
SRE Weekly Issue #252 January 10th, 2021
Articles
Building On-Call Culture at GitHub
Their on-call started out as four 24 hour shifts per person interspersed throughout the year. Find out how they transitioned to a new approach in a process that spanned the start of the pandemic.
Mary Moore-Simmons — GitHub
- The contents of the title are explained according to the following major items.
○ Monolithic On-Call
○ New On-Call Culture
○ Continuing the Journey
○ The expression “Monolithic On-Call” and the hurdles from various perspectives were interesting.
○ I think the characters are a little too tight. I want to make it easier to see with line breaks.
Google Cloud Issue Summary — Google Meet — 2020–12–14
A new Meet version had a higher storage usage requirement, and a backend system filled up.
- A summary of the failures that occurred on Google Meet at 2020–12–14 from 08:20 AM to 11:36 AM (PST) showed that storage surged when new features were released, depleting resources for one data store. Cause. The recurrence prevention measures are as follows.
○ Review alerting processes to improve detection of data store capacity issues
○ Adjust automated monitoring system logs to be more concise and exact to assist in troubleshooting
○ Evaluate existing troubleshooting processes to determine available improvements to mitigation and resolution times.
This is a webinar on alert fatigue, coming up on January 14.
Sarah Wells — Financial Times Jamie Dobson — Container Solutions
- Since it is covered in DEVOPS WEEKLY ISSUE # 524 above, I will skip it.
Announcing the Security Chaos Engineering Report
The chaos experiments you do for security purposes can often expose weak points in reliability as well.
Aaron Rinehart — Verica Kelly Shortridge — Capsul8
- The first article in a series of multiple free O’Reilly reports.
- After issuing the following lines, he touches on the outline of the report while touching on Security Chaos Engineering (SCE), SCE’s core tool “Chao Slinger”, and so on.
○ Hope isn’t a strategy. Likewise, perfection isn’t a plan.
Little Known Ways to Better Use Your Error Budgets
Here are four nifty outside-the-box ideas to use the data you may already have.
Emily Arnott — Blameless
- The following items explain how error budgets can be useful for teams that are beyond the boundaries of departments throughout the organization, such as QA, legal affairs, and executives. It also touches on how engineers can use error budgets beyond development plans.
○ Legal teams can use error budgets as early warnings
○ Executives can use error budgets to take the pulse of development
○Error budgets and SLOs elevate the role of QA
○ Error budgets provide objectivity for experimentation
Lessons learned in incident management
Their custom incident management tool, DropSEV, can detect incident-worthy availability drops and file an incident automatically, obviating the need for an engineer to decide on severity level on the fly.
Joey Beyda and Ross Delinger — DropBox
- The lessons learned at Dropbox in incident management are divided into the following six items and explained in detail.
- Background
- The SEV process
- Detection
- Diagnosis
- Recovery
- Continuous improvement
- The author hopes this article will serve as a case study of how to systematically understand the incident response of the organization itself and evolve it to meet user needs.
GitHub Availability Report: December 2020
This one has some additional detail on a November outage involving MySQL replication lag.
Keith Ballinger — GitHub
- December 2020 of GitHub’s monthly Availability Report, which I have covered several times in this blog.
- In December, there were no incidents leading to service downtime, so we provide an overview of incident response and follow-up details as described in the November report.
Outages
- Slack
My first couple hours of work this year were oddly quiet… - Heroku
- Google Meet
This is different from the one above. - Fanduel
- Twitch
- Coinbase
- Archive of Our Own
KubeWeekly #246 January 15th, 2021
The Headlines
Editor’s pick of the highlights from the past week.
CNCF Security Whitepaper Shows the Complexity of Securing Cloud Native Operations
Jack Wallen, The New Stack
Jack Wallen of The New Stack dives into CNCF’s Security whitepaper that focuses on the security of cloud native applications and highlights key learnings. The whitepaper discusses everything from cloud native layers, to the full lifecycle of development, to compliance (and everything in between).
- It introduces the “Security whitepaper” released by CNCF .
- It digs deep into the white paper from an administrator’s perspective, touching on the need to develop and manage with complexity across multiple layers of the cloud.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
Analyze Kubernetes files for errors with KubeLinter
Jessica Cherry, Opensource.com
- An article which describes KubeLinter, an open source project released by Stackrox for analyzing security issues and erroneous code in YAML files. Red Hat announced that it has signed a definitive agreement to acquire StackRox.
Cedric Clyburn, Red Hat
- As the title suggests, it explains how to get started with “Buildah”. YouTube video of the interactive session is also embedded in the web page.
Salman Iqbal
- Continuing from last week, Salman’s YouTube video features the Webinar series, which describes the behavior of each Kubernetes component. It is easy to see because the time is settled in about 10 minutes.
Build Your Kubernetes Operator With the Right Tool
Alex Handy, Red Hat
- It touches on the current state of choice when building Kubernetes Operators for software, and describes different approaches to simplifying decision-making for your use case.
The Editorial
Articles, announcements, and more that give you a high-level overview of challenges and features.
Sysdig 2021 container security and usage report: Shifting left is not enough
Aaron Newcomb, Sysdig
- As the title suggests, this is the fourth annual report by Sysdig. It also details metric usage, popular alerts, container density trends, and Kubernetes usage patterns.
- The numbers and proportions of each element are expressed in an easy-to-understand manner by combining figures and graphs.
Vertical Pod Autoscaling: The Definitive Guide
Povilas Versockas
- As the author writes, the vertical scaling of the “Definitive / Complete guide” for pods is comprehensively explained with the following items. An article to be read again.
○ Why do we need Vertical Pod Autoscaling?
○ Kubernetes Resource Requirements Model
○ What is Vertical Pod Autoscaling?
○ Understanding Recommendations
○ When to use VPA?
○ VPA Limitations
○ Real-World Examples
○ How does VPA work?
○ VPA’s Recommendation model
○ Lots more
What’s Your Kubernetes Maturity?
Danielle Cook, Fairwinds
- It provides an end-to-end overview of the Kubernetes journey, the phases it passes through, and the Kubernetes Maturity Model, which provides the skills and activities you need to learn / perform in each.
- Click here to check the details of each face. This article only provides a brief summary of each phase.
○ Phase 1 Prepare
○ Phase 2 Transform
○ Phase 3 Deploy
○Phase 4 Build Confidence
○ Phase 5 Improve Operations
○ Phase 6 Measure & Control
○Phase 7 Optimize & Automate
Upcoming CNCF Online Programs
We have expanded our webinar program to Online Programs! Visit our website for the latest updates.
- I checked the link, “Upcoming webinars” was “No Results Found” as of January 16, 2021, so this year’s Webinar seems to be still waiting for updates.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!