- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #531 February 28th, 2021
- The title is “How We Minimized the Overhead of Kubernetes in our Job System”.
- I will skip it because it was covered in KubeWeekly#252 last week.
- The title is “Security Logging in Cloud Environments — AWS”.
- An article in its blog post series, “Continuous Visibility into Ephemeral Cloud Environments”, describes a design for a state of the art multi-account security-related logging platform in AWS.
- Later posts of this series will cover a similar setup for both GCP and Kubernetes.
- The title is “An oral history of #hugops: How tech’s first responders built a culture of empathy”
- It explains how engineers who continue to run the cloud created their own culture of empathy, focusing on how the hashtag #hugops on Twitter spread from the story that tells the suffering history of the operation engineer.
- The title is “Infrastructure as Code at Enterprise Scale: Identify the Right Approach for Your Organization”.
- It focuses on the two largest public clouds, AWS and Azure, as tools and detailed guidelines to help extend the IaC approach.
- It’s up to the reader how to define “enterprise” in the title. The author defines as follows.
○ How you define “enterprise” is up to you: whether you’re a Fortune 500 company or a garage-based upstart, this guide is for you.
JSON comes in a surprisingly large number of formats, with subtle differences. Throw in different JSON parsers in different languages and there is the potential for vulnerabilities caused by interoperability issues.
- The title is “An Exploration of JSON Interoperability Vulnerabilities”.
- TL; DR is as follows, and you can jump to the hands-on lab page of the GitHub page from the link.
○ TL;DR The same JSON document can be parsed with different values across microservices, leading to a variety of potential security risks. If you prefer a hands-on approach, try the labs and when they scare you, come back and read on.
- The author explains JSON INTEROPERABILITY SECURITY RISKS in the following five categories.
- Inconsistent Duplicate Key Precedence
- Key Collision: Character truncation and Comments
- JSON Serialization Quirks
- Float and Integer Representation
- Permissive Parsing and Other Bugs
- The title is “Linux System Monitoring Fundamentals”.
- It is explained according to the title, and introduces the following four Linux system monitoring tools as important and worth further investigation.
- The title is “Breaking down and fixing Kubernetes”.
- First of all, I rm -rf /etc/kuberneteswas scared by the illustration in the beginning. It introduces this command and explains how to destroy a Kubernetes cluster, delete a certificate, and recover from it.
- There is also an etcd version of the article “Breaking down and fixing etcd cluster” by the same author, which is good for understanding the file structure and behavior of Kubernetes.
- The title is “Parameter Store vs Secrets Manager”.
- The illustration at the beginning of the web page is “Street II Ryu vs Ken”! ️
- It is compared and explained according to the title with the following structure.
○ Round 1: Key Value Store
○ Round 2: Storage Limitations
○ Round 3: Encryption
○ Round 4: Rotation
○ Round 5: Cost
○ The Verdict
- The title is “Seamless Multi-Container Live Debugging in VSCode | DevContainers on Steroid”.
- It explains remote live debugging of multi-container workspaces or monolipo-style workspaces for containerized apps.
- The source code can be found on this Github page.
- A GitHub page of “cloudquery”, a tool for pulling, normalizing, publishing and monitoring cloud infrastructure and SaaS apps as SQL or Graph (Neo4j) databases.
- A GitHub page of “murex”, a Shell like bash / zsh / fish / etc.
- It follows the same syntax as a POSIX shell like Bash, but supports more advanced features than you would normally expect from a $SHELL.
SRE Weekly Issue #259 February 28th, 2021
This quarter’s Increment issue is about Reliability, and I haven’t had this much fun since their first issue about on-call. I’ll include a few of the articles here and more in later issues as I have a chance to review them.
- The theme of ISSUE 16, FEBRUARY 2021 in the printed and digital magazine “Increment”, which explains how the team builds and operates software systems on a large scale, is introduced in “Reliability”. This time, the following three articles are taken up from this Increment.
Accepting that imperfect things still work is fundamental to preventing failures from becoming catastrophes.
Understanding that no system is without errors is critical to building resilient systems.
- As the subtitle states, “Accepting that imperfect things still work is fundamental to preventing failures from becoming catastrophes.” explained at the following points.
○ Control is an illusion
○ Failure is inevitable
○ Responding to fragility
○ Designing against disasters
○ Accept imperfection, within limits
The very first sentence sets the tone, and I love it:
Resilience is a process: something you must actively perform, not something you check off a list once.
- As the subtitle states, “By encoding resilience into an organization’s culture, engineering teams can be better equipped to tackle the unknown and unexpected.” It explains how to build a growth-oriented culture that can keep learning, improving, and building resilience for years to come.
Most of all, having an incident commander only works if everyone believes in the role. Someone stepping in to address a crisis and saying “I’m Batman” doesn’t help unless people have bought into the idea of Batman.
The next time I’m incident commander, I am totally going to jump in and say, “I’m Batman!”.
This article is a great primer on what an IC is and how to adopt incident command at your organization.
- With the following points, it explains how to fight fire affects how quickly an outage can be resolved, the appointment of an incident commander can help, and the reader can be one of them.
○ Enter incident command
○ The incident commander’s role
○ Making it work
○ You’ve got to believe
○ It’s your turn
After reading this blog post, you will have an understanding of the retry pattern used in microservices architecture, why it should be used, a few considerations while using the retry pattern, and how to use it in Python.
I love the W. C. Fields quote.
- The contents are as described above, and are explained with the following structure. Figures and codes may be written in an easy-to-understand manner.
○ Retry pattern
○ Adding delays between retries
○ Retrying only on certain exceptions
○ Few other considerations
It’s that time again! Be sure to fill out the survey, not only so they can gather useful data, but also because Catchpoint will donate $5 to charity.
DevOps Institute, Catchpoint, and VMWare Tanzu
- An introduction on the above survey by DevOps Institute. It will create a report from the survey results and publish it.
- The deadline is April 1, 2021, and the charity is also held as mentioned above. You can take it from”Take the survey now”.
When considering the value of a QA test, SLIs can provide very valuable context.
SRE and QA can work hand in hand.
Emily Arnott — Blameless
- Citing Alex Hidalgo’s “Implementing Service Level Objectives” illustration, it explains that “When implementing SRE, almost every role within your IT organization will change. One of the biggest transformations will be in your Quality Assurance teams.”.
This kind of thing keeps me up at night. Silent data corruption can destroy your reliability just as quickly as a backhoe on a non-redundant link.
Harish Dattatraya Dixit — Facebook
- From the above paper, it describes the best practices for detecting and remediating silent data corruptions on a scale of hundreds of thousands of machines.
- Click the link for the full version of the paper “Silent data corruptions at scale”.
Etsy experienced years of growth practically overnight in 2020 as quarantines set in. Here’s how they handled it.Mike Adler — Etsy
- The contents commented by the Editor above are explained in the following structure. An organization in which the blameless post-mortem culture works.
○ The Challenge
○ Modulating Our Pace of Change
○ Adapting Our “Macro” Load Testing
○ Modeling History To Inform Capacity Planning
○ Cresting The Peak
- Let’s Encrypt
- Google Voice
This is Google’s analysis for the incident on February 16, caused by a TLS certificate management mishap.
- India’s National Stock Exchange (NSE)
- US Federal Reserve
The US Fed’s computer system was down, preventing transfers between banks from going through.
- Facebook and Instagram
KubeWeekly #253 March 5th, 2021
Editor’s pick of the highlights from the past week.
Schedule for KubeCon + CloudNativeCon Europe 2021 — Virtual is now available!
KubeCon + CloudNativeCon Europe 2021 Virtual is happening May 4–7, 2021 and the schedule is now available. Experts from organizations including Adobe, Apple, CERN, NVIDIA, and OVHcloud will deliver 100+ sessions, keynotes, lightning talks, and breakout sessions. There will also be more than 60 sessions hosted by project maintainers — spanning beginner-level introductions, end user case studies, and technical deep dives.
- As mentioned above, the schedule for KubeCon + CloudNativeCon Europe 2021 Virtual has been released. In Japan, there is time in the latter half of GW holidays, so I have time to decide the session to watch gradually.
- The article also introduces a community-curated schedule and I will watch the session below.
○ The community-curated schedule will feature sessions from leading open source technologists, including:
■ “Your Path To Non-code Contribution In The Kubernetes Community” — Kaslin Fields, Google; Kat Cosgrove, JFrog; Matt Broberg, Red Hat; Kohei Ota, HPE
Tutorials, tools, and more that take you on a deep dive into the code.
Ajit Chelat, Logiq
- It specifically describes Kubernetes health metrics that should be monitored.
- The Table of Contents is below.
- Crash Loops
- Cluster State Metrics
- Disk and Memory Pressure
- Network Unavailable
- CPU Utilization
- Job Failures
- Monitoring Kubernetes Health Metrics
Yuri Grinshteyn, Reliability Engineer, Google Cloud
- The following two are explained.
○ We’ll walk through deploying a sample app to your cluster and configuring an alerting policy that will notify you if there are any container restarts observed.
○ From there, we’ll trigger the alert and explore how the new GKE dashboard makes it easy to identify the issue and determine exactly what’s going on with your workload or infrastructure that may be causing it.
- The video with the above title from the “The Stack Docker(#stackdoctor)” series on YouTube’s Google Cloud Tech channel is also embedded in the Web page.
Charles Pretzer, Buoyant
- The Linkerd 2.10 release adds a new feature, “Opaque Ports”. We’ve had quite a few questions about this feature from the Linkerd community on Slack and GitHub, so it focuses on one of the most important underlying features that enables Linkerd to perform this feat: Protocol Detection.
Simone Busoli, NearForm
- I cannot reach the linked web page. (As of 2021/03/06 12:35 JST). I could click the blog title from the top of the web page, but did not work. What happens?
Kevin Lefevre, CTO, Particle
- It explains the limitation of a Prometheus only monitoring stack and why moving to a Thanos based stack can improve metrics retention and also reduce overall infrastructure cost.
Josh van Leeuwen, Jetstack
- Based on the history and current situation, it shares what they have done and what they have learned. Working with the Security WG in the Istio community, as well as a number of our customers, Jetstack’s cert-manager team has built an integration that enables cert-manager to sign workload certificates in an Istio service mesh.
Bryan Boreham, WeaveWorks
- As the title suggests, Kubernetes Event is explained while showing an example of log output, and the following Warning is also described.
○ Warning: ‘kubectl get events’ can spew out a lot of information, especially as your cluster gets busier. Sadly it does not list the events in timestamp order, so you either have to have some idea what you are looking for, or pipe the output to a file and analyze it with the Mk 1 eyeball.
- A 90-minute Webinar video with the above title. There is also a demo, and you can jump to the part you want to see with the chapter function on the right side.
Saiyam Pathak, Civo
- A 7-minute Webinar video with the above title. I thought it would be nice to be kind enough to respond to comments from those who want introductory content in the comments section.
Sascha Haase, Kubermatic
- It explains why you need multi-cluster management, how Kubermatic Kubernetes Platform leverages Kubernetes Operators to automate cluster lifecycle management across multiple clusters, clouds, and regions, and how to get started today.
Sofia Parafina, Pulumi
- It explains how to use the infrastructure as code to create basic Kubernetes objects and high-level abstractions built on them.
- Specifically, it describes how to use Pulumi to set up a Kubernetes cluster on AWS, Azure, and GCP. Creating a cluster depends on the cloud provider, but the process is generally the same.
- It is the first article in the series on using infrastructure as Kubernetes code. In the next article, It will explain basic Kubernetes objects such as Pod, Service, and volume.
Aman Bisht, Infracloud
- It explains why one of their enterprise customers needed to switch to Jenkins’ multi-branch pipeline and how it made their lives easier.
○ Freestyle Vs Pipeline jobs
○ Why did we move to Multibranch Pipeline?
○ Sample Jenkinsfile Template
○ Benefits of Multi-branch Pipeline
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Rethinking your Company’s Cloud Security in the Shadow of the SolarWinds Attack
Amir Kaushansky & Leonid Sandler @ARMO
- It analyzes SolarWinds Attack and explains it for a deeper understanding of vulnerabilities in cloud-native environments such as Kubernetes, and then lists effective measures to eliminate or mitigate the risks inherent in cloud environments.
Thomas Graf @Isovalent
- It covers everything from the basics of Kubernetes network policy to more advanced concepts.
- It explains step by step from setting simple policies to finding and avoiding conflicting rules, checking for common mistakes, and addressing difficult questions such as investigating advanced real-world policy examples similar to those implemented by key Kubernetes users.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Chris Short and Kirsten Newcomer, Red Hat
- Approximately 1 hour session where explanations and discussions are given on the themes such as the following along with the title.
○ Security isn’t just for Ops teams anymore — what do we need to do to make security a focal point of app dev as well? And why is security important for containers and Kubernetes?
Matthew Broberg, Red Hat
- It shares what it learned about contributing to Kubernetes. It hopes it helps readers find the focus and time to join in.
Gaurav Rishi, Kasten
- Here are seven reasons why Kubernetes native backup solutions are the best way to protect your expanding Kubernetes environment.
- It accommodates Kubernetes deployment patterns.
- It aligns with “Shift-left” development.
- It simplifies operations.
- It accommodates multi-cluster scalability.
- It closes protection gaps.
- It bolsters security.
- Integration with the cloud native ecosystem.
- It is a case study article of Fidelity Investments. It is explained in the following items.
○ One issue that quickly arose was that Fidelity also had distributions of Kubernetes on-prem, as well as on other cloud providers. How could they introduce, for example, a new security process across 1,000 distributed applications?
- The web page has an embedded video “End User Panel: GITOPS in the Enterprise -Real World Experiences — Cheryl Hung” that shares case studies.
Upcoming CNCF Online Programs
This Week in Cloud Native (Livestream): Kubernetes Community Days: Ask me Anything
Bill Mulligan @CNCF
March 10, 2021
Deploying K3s at the Edge for Multiplayer Gaming
Marco Mancini @OpenNebula
March 11, 2021
CNCF Online Programs Playlist on YouTube
Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.
- For more information, please visit our updated Online Programs page.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.