SRE / DevOps / Kubernetes Weekly Collection#67(Week 19, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #541 May 9th, 2021
SRE Weekly Issue #269 May 9th, 2021
KubeWeekly # 262 May 21st, 2021 ← KubeCon + CloudNativeCon Europe 2021, KubeWeekly will be closed for 2 weeks and will resume on May 21st.
DEVOPS WEEKLY ISSUE #541 May 9th, 2021
News
Are developer portals (like those powered by Backstage) an anti-pattern? This post argues yes.
- The title is “Developer Portals Are an Anti-Pattern”.
- After explaining what it is like to the open source “Backstage” by Spotify, which was taken up last week , it states its opinions and what made it feel advicing a “wrong direction” approach.
- A playlist published on YouTube for GitOps Con 2021.
- The title is “Hosting SQLite databases on Github Pages”.
- As mentioned above, the author shows how to use a SQLite database on a static website with a tool it wrote.
- The title is “Packaging and deploying AWS Lambda functions written in Java with AWS Cloud Development Kit”.
- As mentioned above, it explains how to build and package a Lambda function written in Java via the AWS CDK that has external dependencies.
- The title is “Pyston v2.2: faster and open source”.
- It introduces Pyston v2.2, the latest version of Pyston, a faster implementation of Python.
Jobs
Gitpod.io is looking for senior engineers helping to build out our SRE team.
Want to work in open source and fully-remote with some of the world’s most talented K8s and developer tools engineers? You are obsessed with DevX and automating our workflows? We pioneered the concept of dev-environments-as-code and provision automated and ready-to-code development environments that blend in your existing workflow. We’d love to hear from you.
- As mentioned above, job listings for multiple positions of senior engineers.
Tools
- A web page of “Pixie” that gives you instant visibility with access to metrics, events, traces, and logs without changing your code.
- Click here for the GitHub page.
- A GitHub page of “DazedAndConfused”, a tool to help you identify dependencies.
- A GitHub page of Minimal Kubernetes API server “kcp”.
- A Web page of Kubernetes cluster authentication service “Pinniped”.
SRE Weekly Issue #269 May 9th, 2021
Articles
Edgar: Solving Mysteries Faster with Observability
We built Edgar to ease this burden, by empowering our users to troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata.
Kevin Lew, Maulik Pandey, Narayanan Arunachalam, Dustin Haffner, Andrei Ushakov, Seth Katz, Greg Burrell, Ram Vaithilingam, Mike Smith and Elizabeth Carretto — Netflix
- An article on 09/03/2020 that introduces “Edgar”, a self-service tool for troubleshooting Netflix’s distributed systems. At that time, the following answer was given to a “request for open source it”.
- Unfortunately, we don’t have any short-term plans to open source Edgar, but it’s on our radar as something to consider. A lot of Edgar is very Netflix-specific, and we’d have some work to do to make it abstract enough and consumable enough for open source. But maybe someday!
The Comprehensive Site Reliability Engineering (SRE) PDF
The PDF covers 5 main areas:
- Availability
- Performance
- Monitoring
- Incident Response
- Preparation
No account required or form to fill out to download the PDF.
Splunk/VictorOps
- From the guidebook “Resilience First”, the explanation focuses on the “Core components of SRE” of the above 5 main areas that the Editor says.
- Other reference materials are introduced in the “Additional SRE resources” section at the end.
What are MTTx Metrics Good For? Let’s Find Out.
This one’s especially interesting for the section about what MTTx metrics aren’t good for, and the following section on how to improve them.
Emily Arnott — Blameless
- As the title suggests, the following points explain the advantages of MTTx Metrics, which represents the Mean Time To x.
○ What are common MTTx metrics and why are they used?
○ What are some problems with relying on MTTx metrics?
○ How can I make MTTx metrics more helpful?
○ How do I move away from shallow metrics?
○ How better metrics help build a blameless culture
Resiliency and Disaster Recovery with Kafka
If you’re interested in deploying Kafka in a multi-region configuration, eBay has put quite a bit of thought into this and has a lot to share.
Engin Yoeyen — eBay
- It outlines technical scenarios that require ordered events, highlights some challenges, and describes possible solutions for performing multi-region Kafka setups.
What Chaos Engineering Is (and Isn’t)
Straight from someone who was there from the start. The “what chaos engineering is not” section is especially enlightening.
Casey Rosenthal — Verica
- Along the title, it explains the historical background and points of chaos engineering.
Heroku incident #2226 follow-up: Private Space apps experiencing domain to SSL cert mapping errors
The last paragraph regarding “unknown unknowns” is noteworthy.
Heroku
- As commented in the Editor above, the “unknown, unknown” part is the highlight of this article.
Failover Conf follow-up: Your team and culture questions answered!
There are some great questions in here on blamelessness and full service ownership.
James Thigpen — Gremlin
- As a follow-up to Failover Conf, answered questions that the panelists could not answer due to time constraints, and touched on the theme of “evolution.”
- The beautiful and colorful illustrations of each session are great.
Outages
- Google Cloud Platform us-west2 region
They posted a detailed follow-up at the above link. - SRE Weekly Issue #269 — SRE WEEKLY
- Network Solutions and Register.com
- Singapore Exchange (SGX)
- Speak
KubeWeekly # 262 May 21st, 2021 ← Due to KubeCon + CloudNativeCon Europe 2021, KubeWeekly will be closed for 2 weeks and will resume on May 21st.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!