- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #478 February 23rd, 2020
- The title is “Lessons from the TSB failure: a perfect storm of waterfall failures”.
- In this article, It checked walk through some of the key points in the executive summary of the independent report, carried out by Slaughter and May, into the TSB migration failure which is likely to have prompted this outsourcing.
- The title is “Got Game? Secrets of Great Incident Management”.
- On-call came in at 2 o’clock in the morning of one day.
- The author handled the outage in the following line; Escalation -> Convene the trouble response team -> Division of roles -> Investigate the cause -> Solve the problem-> Close the case
- Incident closed at 2.16. Expected to go back to bed and they would see in the morning, but it was mid-afternoon. So, it was an exercise.
Devops conversations often turn to how organisational
structure impacts the work we do. This post cleverly looks at
organisational structure not through the org chart, but through how people actually work and influence others. When we say we ship the org chart, we need to ask which one.
- The title is “The Shadow Organizational Chart”.
- It is a blog post of the website of Carta. The CEO of Carta has long felt there is a shadow org chart, much like a shadow economy, where employees trade ideas, give direction, offer help, and spread culture.
- He wanted to map this shadow org chart and find employees who have disproportionate levels of influence relative to their hierarchical position. He also wanted to see the influence centers and decision makers, and the directional current between them and the rest of the company.
- He is using Innovisor to create my internal (human relations) network graph.
- The title is “Towards Operational Excellence: Part 2-On the importance of tools”.
- The author is Adrian Hornsby of Principal Evangelist, Architecture
- Part 2 of a series on one of AWS’s best practices, “Operational Excellence.”
- Click here for Part 1.
- It focuses on great tools of the three interconnected elements (culture/great tools/process) that will enable successful operation of the built technology.
- The title is “Templating YAML in Kubernetes with real code”.
- In the article, He suggested using yq or kustomize to template YAML, instead of relying on tools that interpolate strings such as Helm.
- The title is “The Complete Guide to Kubernetes Logging ~How is Logging in Kubernetes different, how it works, how to use it: use cases and best practices.~”’.
- The goal of the article is to introduce the content of the title + tools to manage logs, and to allow readers to aggregate logs from their Kubernetes cluster.
- The title is “AWS Kinesis Firehose throttling with transformation Lambda”.
- The author previously wrote an article, “Terraform AWS Kinesis Firehose + Elasticsearch module”, about a Terraform module which can be used to set up a logging pipeline with AWS Kinesis Firehose and AWS Elasticsearch.
- This article shows how I used a Lambda transformation to coordinate the flow between AWS Kinesis Firehose and AWS Elasticsearch.
- The title is “Kotlin Conf 2019: Unlock Power of Kotlin DSL for Kubernetes by Fedor Korotkov”.
- YouTube video of the presentation at KotlinConf 2019.
- A link to a repository on GitHub for a tool “gops” that lists and diagnoses currently running Go language processes.
SRE Weekly Issue #208 February 23rd, 2020
There’s so much in this article:
- how to recognize when your system may be susceptible to cascading failure
- how to prevent it
- how to deal with it when it happens (and how hard that can be)
Laura Nolan — Slack
- The risk of avoiding and experiencing a detailed analysis and 6 anti-patterns of Laura Nolan, author of the SRE book Chapter 23 “Managing Critical State” and contributing to “Seeking SRE” Propose a method to reduce.
- It’s a great read, so it’s my personal favorite for this week and bookmarked.
It’s time for this year’s SRE Survey. Don’t forget that with each completed survey, Catchpoint donates $5 to charity.
This growing demand [for SREs] is not without growing pains as a skills gap problem has emerged due to the fact that SRE training requires a hands-on, interactive learning environment.
Peter Murray — Catchpoint
- Information on SRE Survey 2020, which was implemented until February 28. It might take 20 to 25 minutes, and a $500 gift card was offered for the responders.
- The survey results will be released on March 23. It will be in time for SRECON AMERICAS WEST to be held in Santa Clara, California, USA on 3/24–3/26.
Both the summary and the original article are well worth reading. This stood out to me:
As much as we may think of incidents as taking place in all those technical parts of the system below the line, incidents actually take place above it
Thai Wood (summary)
Dr. Richard Cook (original article)
- Richard I. Cook’s article is featured as the 68th issue of the “Resilience Roundup”, which writes information about Resilience on the Internet every week.
- If you would like to join the conversation with the Resilience Discussion Group, you can register here.
The EBS control plane data store resembles a “jellyfish” (actually a Physalia, a.k.a. Portuguese man-of-war).
Timothy Prickett Morgan — The Next Platform
- It is a proposal from the author that If you want inspiration for a hyperscale, resilient distributed block storage service, apparently a jellyfish is a good place to start looking for architectural features.
- It seems interesting, so I will dig deeper again.
Ideal: each team manages their microservice(s) in isolation.
Reality: microservices interact in unexpected ways and a broader system emerges that has remarkable similarities to running a monolith.
Ben Sigelman — LightStep
- It discussed the phenomenon of “deep systems” that is newly occurring due to microservices.
- The number of developers that can operate at the same time for a single app can be limited, and the architecture is such that the infrastructure is operated independently in four or more layers.
- He said that developers should be given new tools to ensure observability so that they can spend their time doing their original job, such as improving the quality of software, rather than troubleshooting.
This one discusses how to handle SRE for a monolith, and some examples of what often goes wrong.
Eric Harvieux — Google
- The title is “Making your monolith more reliable”.
- He touches on some of the most common problems with “monolithic architectures” treating and scaling monolith as a platform, and practicing it with SRE Principles in mind.
The author blocked an unexpected Sunday deploy of untested code, and it turned out to be a good thing they did.
- It took one from the history of the battle with many bad rollouts that the author has experienced.
- The story began at about 3:30 pm on Sunday, local time, one day. A sentence from the admin’s point of view that showed her anger for the engineers in the company about the “recklessness.”
- While the usual support was not available due to the holidays, it was interesting to depict the scenes such as trying to touch the production environment with authority and trying to move without considering reliability.
-> Linked is an interesting explanation from Cloudflare, posted as a comment on a GitHub issue.
- New Relic
-> Fidelity customers saw a $0 balance for their 401(k) [US retirement] accounts.
- Microsoft Office 365 & Outlook down — Users getting service unavailable error
- Heathrow Airport (London, UK)
-> Also this one.
KubeWeekly #205: February 28, 2020
Editor’s pick of the highlights from the past week.
The countdown to KubeCon + CloudNativeCon Europe is on!
Day-0 co-located events are a huge part of the event. This year, CNCF is hosting three co-located events in Amsterdam on Monday, March 30, providing the opportunity for attendees to deep-dive into these technology topics. We’re excited to share that the schedules are now available for these Day 0 events. Please find the details below.
- KubeCon + CloudNativeCon Europe was finally approaching the end of that month, but at that point (as of 3/1 dawn) there was no big change such as schedule due to the COVID-19(Later, it was rescheduled).
Kim McMahon, CNCF
- Cloud Native Security Day, Serverless Practitioners Summit and ServiceMeshCon were scheduled to be held on March 30th(At that moment).
Jeffrey Sica, Red Hat and Amanda Katona, VMware
- Kubernetes Contributor Summit schedule announced. It would be held on March 29 and March 30(But actually, postponed too).
Tutorials, tools, and more that take you on a deep dive into the code.
Palak Bhatia, Product Manager and Janet Kuo, Software Engineer, Google Cloud
- Introducing the new Application Manager (beta) feature of GKE in GCP(at that moment).
- Declarative configuration management according to GitOps principles.
- There are demo videos and tutorials. I want to get my hands dirty with these.
Toader Sebastian, Banzai Cloud
- Introducing Banzai Cloud’s Disaster Recovery function of Apache Kafka and its own product, Banzai Cloud Supertubes, that complements the missing parts.
- Supertubes is a deployment tool that utilizes a cloud-native technology stack to set up and operate production-ready Kafka clusters on Kubernetes.
- Supertubes includes Zookeeper, Banzai Cloud Kafka operator, Envoy, Istio and many other components to operate the above environment.
An enhanced Horizontal Pod Autoscaler for Kubernetes
- Enhanced version of Kubernetes (= enhanced) Pod Horizontal Autoscaler “Pangolin” Link to GitHub page.
- Written in Rust. I feel the author’s preference from the comment of “Why Rust?”
Ran Ribenzaft, Epsagon
- Guest article posted on the CNCF site based on the article written on the company’s site by Ran Ribenzaft of Epsagon.
- “Logging in the good old days” when administrators accessed ssh or tail on bare metal or VM unit, and promised availability far beyond single device in container, easy VM disposal, PaaS environment after that the contrast of itself. “How do I access the logs even though I don’t know which machine the software is managing and a particular service is running?” From the manager’s point of view, I was able to visualize the differences in actual work and issues.
Harold Rosenberg, VMware
- Introducing version 2.0 of “Weathervane”, a tool for benchmarking Kubernetes application-level performance on the VM blog.
Jay Huang, NeuVector
- The subtitle is “Understanding the Real-time Characteristics of Linux Containers.”
- To create a container with high I/O optimization, it is necessary to have a deep understanding of CFS (Completely Fair Scheduler).
Connor Craven, SDxCentral
- Introducing 7 points and products with the advantage and security of using OSS.
- An article in a three-month series examining “The 2020 Kubernetes challenges from The New Stack”.
- Validate the challenges of running stateful workloads on Kubernetes. Touching Stateful Set and CSI.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
Ran Ribenzaft, Chief Technology Officer, Epsagon
- Webinar video on “Managing observability in modern apps” by the author of the above article, CNCF Tools Overview: Fluentd — Unified Logging Layer.
- It’s as easy as the article. There may be about 10 questions and answers.
Matt Farina, Helm Maintainer, Samsung SDS
Hayley Denbraver, Developer Advocate, Snyk
Raghavan “Rags” Srinivas, Lead Container Developer Advocate, Snyk
- Webinar video about Helm’s security.
- The de facto standard for running machine learning workflows on Kubernetes is Kubeflow.
- Webinar video on the theme of seamlessly migrating scientists’ Kubeflow Pipeline that visualizes machine learning codes/experiments/results on Jupiter Notebook.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Scott McCarty, Technical Product Manager, Red Hat
- An article by Red Hat’s Technical Product Manager, looking back at 2019 on New Year’s Day 2020, anticipating five things to happen in the Kubernetes ecosystem in 2020.
- It seems to be wise to consider that there is a bias due to the sites uploaded and his position on which it is posted.
- Forbes article. Kubernetes tides, movements of each company and general movements.
- I think it’s an article that an engineer can read quickly.
- It’s the source of the article “Security concerns hampering the adoption of containers and Kubernetes” I checked here last week.
- I will skip this detail because the contents are already covered, but it is a report summarizing the current state of security of containers and Kubernetes. Almost all of the respondents, 94%, had a security incident in a container environment in the last 12 months.
Jaafar Chraibi, Red Hat
- The question started from the author’s thought “What’s the difference between Kubernetes and OpenShift?” is similar to the question “What’s the difference between an engine and a car?”.
- It was easy to understand the difference between Kubernetes and OpenShift as seen from Red Hat and the current situation, and there were some parts that I thought of by looking at the numbers, “I did not have that perspective.” I reconfirmed that strategy and marketing are important.
- It’s Part 1 of the series, so I’m looking forward to the rest.
Craig Box and Adam Glick, Kubernetes Podcast from Google
- The guest is Pramod Ramarao, Product Manager of NVIDIA.
- “News of the week” includes a lot of news on KubeWeekly, including last week’s, but still the rest are many. A lot of GCP-related ones.
Emily Omier, The New Stack
- One article in a three-month series that examines the challenges of Kubernetes of The New Stack in 2020 as well as “Different Approaches for Building Stateful Kubernetes Applications” above.
- They interviewed Saad Ali, a Google software engineer and chair of CNCF’s Kubernetes Storage Special Interest Group, and talked about how to manage stateful workloads with Kubernetes, the key issues, what Storage SIG is now working on and the future.
Puja Abbassi, Giant Swarm
- Unlike the days when Docker was the strongest in the early days, there are many tools other than Docker that build container images. As long as the built image conforms to the OCI specifications, it works, so you don’t have to worry about speculations or fragmentation.
Arvind Gupta, The New Stack
- To support a variety of use cases, early Kubernetes developers gave this platform a deliberate gap to give users flexibility. In other words, it is designed so that the environment can be expanded with CRD, CSI, and CNI. This gives flexibility to both infrastructure and app layers.
- When adopting Kubernetes in your organization, it’s important to consider infrastructure and app management that meet your overall requirements in a way that minimizes the time, effort, and cost required.
Jérôme Petazzoni, Ardan Labs
- The previous Part 1 article introduced multi-stage builds, static and dynamic linking, and briefly mentioned Alpine.
- In this Part 2 article, lt covered Go-specific details, and then Alpine. Finally, I saw how it works with other languages such as Java, Node, Python, Ruby, and Rust.
Jerry Gamblin, Kenna Security
- Last summer, he launched vulnerablecontainers.org to reveal the number of vulnerabilities in the 1,000 most popular docker hub containers.
- Shortly after launching the project, several people were asked if he could scan other public containers.
- Hw wanted to provide this feature, so he decided not to sleep for the last two weeks and built the first API to publish that day.
- scan.vulnerablecontainers.org is an open Python API built using Trivy, Flask, Gunicorn, and Nginx, and currently has two public endpoints (more endpoints and tools will be provided). From the beginning, it was designed to be easy to use in a browser or CLI for integration with CI/CD.
- It was an early vader version, so was not meant to be used in production yet.
- He made it without sleeping, so it’s amiable to have a blur of “Notice Something Boken?”, but he’d love to hear your feedback, so he’d love anyone who is interested.
Kim McMahon, CNCF
- CNCF would participate as a sponsor and exhibitor of the 18th annual event SCaLE 18x, which was held from 3/5 to 3/8. The location was Pasadena, California.
- Kim McMahon would be a representative of the CNCF and looked forward to meeting community members.
- Kubernetes socks would be distributed for free at booth #311! Volunteers for the booth by community members were also being recruited.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Kubernetes Security Best Practices for DevOps
Frédéric Harper, Senior Developer Advocate @DigitalOcean
March 3, 2020 10:00 AM Pacific Time
Service Mess to #ServiceMesh
March 4, 2020 10:00 AM Pacific Time
What’s New in Linkerd 2.7
March 6, 2020 10:00 AM Pacific Time
Kubernetes Security Best Practices for DevOps
Connor Gorman, Principal Engineer @StackRox
March 11, 2020 10:00 AM Pacific Time
Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
March 13, 2020 10:00 AM Pacific Time
How to migrate a MySQL Database to Vitess
Liz van Dijk, @PlanetScale
March 20, 2020 10:00 AM Pacific Time
Argo CD, Flux CD and the GitOps Revolution
Jay Pipes Principal, Open Source Engineer @Amazon Web Services
March 24, 2020 10:00 AM Pacific Time
Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
April 8, 2020 10:00 AM Pacific Time
April 23, 2020 9:00 AM Pacific Time
Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.