SRE / DevOps / Kubernetes Weekly Collection#60(Week 12, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #534 March 21st, 2021
The State of Devops report is in its 10th year. This year’s survey is now open, focusing this year on how teams and work are organized, interaction between teams, feedback loops, self-service and more.
- I will skip it because it is taken up in last week’s KubeWeekly #255,
- Click here for the survey.
- The title is “Why the World Needs a Software Bill Of Materials Now”.
- Through the case of hacking “Sunburst” as a starting point, it explains software supply chain attacks, bill of materials (BOM), and so on.
gRPC is a general-purpose RPC layer. Addressing a range of different types of services means it’s configurable. And configuration is often a source of errors. This post explains why, along with some examples to learn from.
- The title is “gRPC is easy to misconfigure”.
- It describes the following two annoying edge cases that the author recently encountered.
○ Client keepalive is dangerous: do not use it
○ Servers cannot return errors larger than 7 kiB
- As mentioned above, the article of the trilogy. The title of Part 1 of the above link is “Troubleshooting web apps issues: 6 recent cases from our SREs”.
- Click here for Part 2 “Recent troubleshooting cases from our SREs, part 2”.
- Click here for Part 3 “Recent troubleshooting cases from our SREs, part 3”.
- The title is “Closing CI/CD loop using Argoproj”.
- It describes the “ArgoCD Image Updater”, a tool that continuously monitors when new container images for apps are available and automatically updates image tags in Git repositories.
- The title is “The biggest challenges — and important role — of application security”.
- They share insights into application security (AppSec), its role in security organizations, and the challenges of AppSec professionals.
- The title is “runj: a new OCI Runtime for FreeBSD Jails”.
- It introduces a new experimental POC(proof-of-concept) OCI compatible runtime “runj” for the open-sourced FreeBSD jail.
SRE Weekly Issue #262 March 21st, 2021
Chaos Engineering isn’t adding chaos to your systems — it’s seeing the chaos that already exists in your systems.
Along with four prerequisites, this article also includes 3 myths about chaos engineering that might be making you feel hesitant about starting.
Courtney Nash — Verica
- It explains the basics needed to prepare for the team’s chaos engineering.
- At the beginning, it describes that the article is based on the assumption that you are familiar with chaos engineering to some extent, and recommends an article shown below.
○ Ed note: This post presumes you have some familiarity with Chaos Engineering, and are considering whether you can start experimenting with it at your organization. If you’re not familiar with Chaos Engineering, here’s a great post to get you up to speed.
This one’s from May of last year. Almost a year on, it’s interesting to see which of these we’ve already implemented.
Ashley Roof — Transposit
- Eric Mayers , a veteran on-call veteran who has managed the on-call engineering team for 20 years on Google, YouTube, YikYak, etc., offered his practical advice from the early days of building a successful remote on-call engineering organization.
An amusing parable illustrating why not to try to be too reliable.
Andrew Ford — Indeed
- From the experience over the weekend, the author explains some of the good things that Indeed applies to how it manages system reliability and the speed of new features.
In the Outages section of last week’s issue, you’ll find two unrelated events referenced in this article: one about Russian internet censorship gone awry and another about a major datacenter fire.
Eric Johansson — Verdict
- It describes the announcement by Google and the Russian authorities related to the fire at the data center of cloud service provider OVHcloud in France.
Along with what’s in the title, this article also covers the difference between an RCA and a contributing factors analysis.
Emily Arnott — Blameless
- The following points are explained along with the title.
○ The feature launch schedule doesn’t account for server update timings
○ No policy to scale up server availability for feature launches
○ Server architecture could be updated to support more traffic
○ Incident response team could be overworked with new feature launch, delaying backup server availability
Lots of detail on how LinkedIn is improving their traffic forecasts. Warning/enticement: math contained within.
Deepanshu Mehndiratta — LinkedIn
- A few years ago, an unprecedented increase in traffic broke the load test model, detailing what we did in response to the struggle to pass load tests across production data centers.
Everyone is testing in production, some organizations admit and plan for it.
How to do it right, what can happen if it goes wrong, and how to limit the blast radius.
Heidi Waterhouse — LaunchDarkly
- As the title suggests, it explains testing in a production environment while answering questions such as “Does production testing replace other testing?” Two helpful YouTube videos are embedded at the end of the web page.
Remember when GitHub logged you out? Ah, I remember it like it was last week. I mean, the week before. Here’s GitHub’s troubleshooting story about what went wrong.
Dirkjan Bussink — GitHub
- It shares what GitHub did on March 8 as a security vulnerability response.
- Google Cloud Platform
GCP had a major multi-region networking issue, due to a routing glitch. Click through for their followup post.
- US National Oceanic and Atmospheric Administration (NOAA)
This outage impaired NOAA’s tsunami early warning system.
- Facebook, Instagram, and WhatsApp
- Elevated error rates
- Microsoft Teams and other services
Click through for a highly detailed description of what went wrong. I can’t link directly to the incident in question, so you’ll have to scroll down to 3/15.
KubeWeekly #256 March 26th, 2021
Editor’s pick of the highlights from the past week.
KubeCon + CloudNativeCon Europe 2021 — Virtual is right around the corner (May 4–7) and what better way to extend your experience than adding on registration for a co-located event? These additional educational opportunities (additional registration and fee required) will take place on May 3 or 4, and we’re excited to share the recently published schedules for CNCF-hosted events. Find the details below:
Cloud Native Rust Day (May 3)
Cloud Native Security Day Europe (May 4)
Cloud Native Wasm Day (May 4)
Crossplane Community Day Europe (May 4)
FluentCon: Cloud Native Logging day with Fluent Bit & Fluentd (May 4)
Kubernetes AI Day (May 4)
Kubernetes on Edge Day (May 4)
Magma Day (May 3)
PromCon Online 2021 (May 3)
ServiceMeshCon Europe (May 4)
- It seems that KubeCon + CloudNativeCon Europe 2021 Co-located events have been completed. I’m looking forward to more choices.
Tutorials, tools, and more that take you on a deep dive into the code.
- An article that aims to help readers understand the most important concepts of modern continuous delivery and maintain a Kubernetes deployment with a complete continuous delivery workflow without having to write a single line of pipeline code. Introducing a demo using Keptn.
- A YouTube video explaining the process of setting up an Ingress Controller on a Kubernetes cluster on Docker-Desktop.
- The following points are explained along with the title.
○ Vertically Scaling the Cluster
○ Horizontally Scaling the Cluster
○ Horizontally Scaling an Individual Microservice
○ Elastic Scaling for the Cluster
○ Elastic Scaling for an Individual Microservice
○ About the Book: Bootstrapping Microservices
○ Other Kubernetes Resources
Saiyam Pathak, Civo
- As the title suggests, a YouTube video that introduces Prometheus in about 5 minutes. Before I knew it, it became a series called “CNCF Minutes”.
- It Introduces Alpha support for OpenTelemetry access logs in Envoy, which implements access logs based on the OpenTelemetry 0.7.0 Protocol release.
A secure container runtime with OCI interface
- Quark Container’s GitHub page. The features are the following three points.
- OCI compatible: Quark Container includes an Open Container Initiative (OCI) interface. Common Docker container images can r un in Quark Container.
- Secure: It provides Virtual Machine level workload isolation and security.
- High Performance: Quark Container is born for container workload execution with high performance. It developed with Rust language.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Jürgen Etzlstorfer & Andi Grabner @Dynatrace
- Keptn’s Maintainers showcase the most common use cases, how to get started with your first project, and how to use Keptn to extend these practices to all your projects in your enterprise.
Stefan Prodan @Weaveworks
- It provides an overview of the “Flux” project, its evolution, the path to Flux 2, the meaning of Flux v1 maintenance mode, the best entry point to get started, and how to migrate.
Deepika Dixit & Onkar Bhat astKasten by Veeam
- An overview of “Dex”, an open source identity service that uses OpenID connect to facilitate authentication of other apps, and an effective way to adopt RBAC while covering most of the use cases. ing.
YY Wan & Nick Lanham @Databricks
- It discusses why M3 was decided, how it was deployed, and shares lessons learned in the process.
Robert Ross @FireHydrant
- The following points explain why giving top priority to APIs becomes a business game changer.
○ The benefits of building your API first and how it can pay dividends in the long haul
○ The different types of APIs and which choice is the right choice
○ The importance of hosting API documentation
Viktor Farcic @CodeFresh
- A one-hour session introducing “Crossplane”.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. The hosts for this episode are Craig Box and Guest Host Liz Rice. Liz Rice is a TOC Chair at CNCF and has recently been transferred from Aqua Security to Isovalent.
- It welcomes Grant Miller , CEO of Replicated’s co-founder, as a guest.
- The topics I was interested in in the News of the week are as follows.
○ Mesh7 to be acquired by VMware
○ NetApp launches Spot Wave
○ Davanum Srinivas elected to the CNCF TOC
Cheryl Hung, CNCF
- As the title suggests, a Keynote slide from The DevOps Conference that introduces the following content and calls for participation in KubeCon + CloudNativeCon Virtual EU, May 4–7.
1. More Rust in Cloud Native
2. Cross Cloud becomes (more) real
3. Web Assembly and eBPF
4. Kubernetes on the Edge
5. GitOps grows significantly
6. Chaos Engineering practices
7. Rise of FinOps
8. Pluggable developer and operator experience
9. Service mesh consolidation
10. End user driven open source
- The first of three survey guides from this week’s CNCF.
- The second of three survey guides from this week’s CNCF.
- Since the original link was broken, I deleted unnecessary parts and described it.
- The third of three survey guides from this week’s CNCF.
- It introduces Cloud Native Security Day (CNSD), which will be held as a Co-located Event during KubeCon EU2021
Upcoming CNCF Online Programs
Cloud Native Live: Application life-cycle orchestration with Keptn
Jürgen Etzlstorfer & Andi Grabner @Dynatrace
March 31, 2021 at 12pm PT
Introducing Kubestr: a new way to explore your Kubernetes Storage Options
Michael Cade @Kasten by Veeam
April 1, 2021
CNCF Online Programs Playlist on YouTube
Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.
- For more information, please visit our updated Online Programs page.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.