SRE / DevOps / Kubernetes Weekly Collection#56(Week 8, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #529 February 14th, 2021
SRE Weekly Issue #257 February 14th, 2021
KubeWeekly #251 February 19th, 2021
DEVOPS WEEKLY ISSUE #530 February 21st, 2021
News
- The title is “What to Expect from DevOps This Year: The Experts Weigh In”.
- As a special edition of the Software Delivery Leadership Delivery Forum, they invited industry analysts to explain along the titles and points above.
- The title is “RubyGems dependency confusion attack side of things”.
- Following the article “Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies”, this article is written to reassure users by presenting a story on the RubyGems side. To avoid misunderstanding, the intention is clearly stated in the “Note” at the beginning.
○ Note: This article is not to deprecate any of the findings and achievements of Alex Birsan. He did great work exploiting specific vulnerabilities and patterns. It is to present the RubyGems side of the story and to reassure you. We actively work to provide a healthy and safe ecosystem for our users.
- The title is “Learn to Code in Python, with Hany Farid”.
- As commented by the Editor above, you can take a free YouTube video lecture at the university’s Python introductory course, and you can also see the attached code.
○ Part 1: Introduction to Programming and Computation
○ Part 2: Introduction to Data Structures and Analytics
- The title is “CS615 — System Administration”. A course with the above Editor’s comments and title. There are [slides] [video lecture] [transcript], and learning can proceed systematically.
- The title is “What Do You Love Most About Working in IT?”.
- To commemorate Valentine’s Day, we ask people who work in IT and IT Service Management (ITSM) roles what they like most about working in IT.
○ The People
○ The Constant Chang
○ The Learning
○ The Opportunity to Solve Problems
○ The Varied Challenges
○ Making a Difference/Helping Others
○ Improving Things at an Industry Level
○ Working with the Technology
○ It Pays Well!
- The title is “Hyperscale SIG”.
- As commented by the Editor above, the CentOS Streams Hyperscale SIG (special interest group) web page for the CentOS project.
Tools
- A GitHub page of CLI tool “Rclone” that performs rsync between different cloud storage providers.
- Click here for the “Rclone” web page.
- As mentioned above, the web page of “runson.cloud”, a convenient online tool that allows you to easily check the cloud provider that the target website uses as a public service.
The above runson.cloud service is based on a set of handy open source libraries that also have accompanying CLI tools that do the same.
SRE Weekly Issue #258 February 21st, 2021
Articles
Practiced Humility in Retrospectives
When acting as a retrospective facilitator, there’s a huge potential to color the discussion with our words and actions.
You’re there to position other folks to learn, not wear the badge.
Will Gallego
- They discussed the following throws that lead to the title.
○ Why wouldn’t we be able to simply apply the calculus to our knowledge and change things for the better?
○ This all speaks to a distinct lack of humility in what we do as a practice. - It is explained according to the following items, and “Humility In Practice” touches on the attitude and behavior to practice.
○ Hubris as Facilitator
○ Top Down Misunderstanding of Retrospectives
○ Humility In Practice
GitHub Repo: upgundecha/howtheysre
upgundecha/howtheysre: A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
A huge thanks to the curator for the many awesome links in this repo! Some have been featured here in previous issues, and some are new to me. As I go through those, I’ll share my favorites here and tell you why I think you should read them.
Unmesh gundecha
- This is very good. A GitHub page with links to resources on “SRE” practices published by tech-savvy companies around the world.
- It includes LinkedIn’s “School of SRE” which I introduced the other day, and Mercari’s resources for Japanese companies.
Engineering dependability and fault tolerance in a distributed system
In this article, we discuss the concepts of dependability and fault tolerance in detail and explain how the Ably platform is designed with fault tolerant approaches to uphold its dependability guarantees.
Paddy Byers — Ably
- It describes the concept of reliability and fault tolerance and how Ably’s platform is designed using a fault tolerant approach to maintain reliability assurance, along with the following:
○ Architectural approaches to achieve reliability
○ Stateful role placement
○ Detect, hash, resume
○ Channel persistence layer
○ Implementation considerations
○ Consensus formation in globally-distributed systems
○ Health is not binary
○ Resource availability issues
○ Resource scalability issues
○ Conclusion
Phishing complaints cause Notion outage
More details on the Notion outage mentioned here last week. Complaints of phishing by a Notion user resulted in their registrar pulling their domain name out of DNS.
Peter Judge — Datacenter Dynamics
- As the title suggests, an article explaining that the failure of the collaboration app “Notion” to go offline for hours last week was caused by a phishing complaint.
What Is True Resilience? (Hint: It’s Not About Managing Risk)
Google has three guiding principles for improving resiliency:
* Create maximum observability of the overall system
* Design for effectiveness, not perfection
* Learn and iterate as you go
Will Grannis — Google
- A Forbes article that explains the content of the title along the following three points.
○ Create maximum observability of the overall system
○ Design for effectiveness, not perfection
○ Learn and iterate as you go - The title throw is emphasized in the following words.
○ True resilience isn’t about managing a particular instance of risk, but being ready for anything through the way you operate.
4 Things you Need to Know about Writing Better Production Readiness Checklists
Better Production Readiness Checklists This is an awesome guide to writing a production-ready checklist — and why you’d want one.
Emily Arnott — Blameless
- The following points are explained along with the title. Document creation can be improved if there are such guidelines.
○How to make a production checklist
○ Why production checklists are helpful
○ Keeping your checklist up to date
○ How Blameless can help integrate your checklists
Fix Fast for finding and fixing regressions
Facebook found that as a regression is discovered later, it will take much longer to deploy a fix. With a combination of heuristics and machine learning, they’re detecting regressions earlier and bringing them to the attention of folks that can fix them.
Jian Zhang and Brian Keller — Facebook
- It describes Facebook’s challenges and a cross-sectoral effort called “Fix Fast” that was launched in 2019 to address these challenges.
Outages
- Google Voice
- Kia
Kia had an outage in the internet-enabled features of some of their cars. - Disney+
- Microsoft Teams
KubeWeekly #252 February 26th, 2021
The Headlines
Editor’s pick of the highlights from the past week.
CNCF Provides Insights into Secrets Management Tools with Latest End User Technology Radar
CNCF
This week, CNCF announced the findings of the fourth CNCF Technology Radar, a guide to a set of emerging technologies based on the experience of the CNCF End User Community. The theme of this edition was secrets management, which was identified by the consumers of cloud native technologies as an essential technology to consider in cloud distributions.
- It introduces that CNCF has released its fourth CNCF End User Technology Radar for “Secret Management”. The YouTube video of this Radar Team is also embedded, so check it out if you like.
- The theme of Radar this time is the following four.
- Vault has the broadest adoption across many companies and industries.
- After Vault, groups tend to use the native solutions provided by their public cloud provider.
- Certificate manager has become a popular choice in the Kubernetes ecosystem.
- Other solutions in the space are fragmented across various levels of maturity and complexity.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
Kubernetes admission controllers in 5 minutes
Kaizhe Huang, Sysdig
- It describes the admission control included in Kubernetes and how to implement image scanning using those webhooks.
- For those who want to dig deeper into this subject, we also have another article, Shielding your Kubernetes runtime with an Admission Controller and image scanning.
How we minimized the overhead of Kubernetes in our job system
Lally Singh and Ashwin Venkatesan, Datadog
- As the title suggests, it describes how Datadog solved the performance degradation caused by overhead when migrating an existing job system to Kubernetes.
How we use Kubernetes at Asana
Tony Liang, Asana
- It explains how to solve the problem with the “KubeApp” framework, which Asana tried to solve with Kubernetes and encountered some problems when using Kubernetes for the first time and built the framework to standardize the creation and maintenance of Kubernetes applications.
Cutting build time in half with Docker’s Buildx Kubernetes Driver
Jeremy Kreutzbender, Release
- In line with the title, it explains first what the original infrastructure looked like and how long it took to build in the sample project, then the changes made and the observed speed improvements to use “buildx”.
Comparing Kubernetes operators for PostgreSQL
Nikolay Bogdanov, flan
- It explains the outcome of their research with the most popular PostgreSQL operators, Stolon, Crunchy Data, Zalando, KubeDB, and StackGres, to meet client needs and implement managed solutions like RDS on Kubernetes.
Puja Lower
- To explain HPA (Horizontal Pod Autoscaling), before we dive into Kubernetes’ autoscaling method, it starts by defining the various types of scaling available via the API.
Sysdig contributes Falco’s kernel module, eBPF probe, and libraries to the CNCF
Loris Degioanni, Sysdig
- It announces and explains the contribution of Sysdig’s Falco kernel module, eBPF Probe, and library to CNCF.
- From now on, all core components of the Falco stack will be part of the CNCF.
Manage Envoy Proxy using go-control-plane
Mahendra Bagul, Infracloud
- It explains how to manage Envoy using go-control-plane and what you need to do to gain a better understanding.
The road to adopting Kubernetes in development
Call Delnat
- A guide for Mac OS to take the first step towards developing on a local Kubernetes cluster. It doesn’t explain how to set up the Kubernetes manifest, but it will help you overcome the first hurdle.
Cloud development environments: Using Skaffold and Telepresence on Kubernetes for fast dev loops(link correction)
Peter O’Neill, Ambassador Labs
- It fixed a link to an article featured in KubeWeekly #250 two weeks ago. If you notice, please fix the original page as well. (When I went to see the link on page #250, I still got a 404 error)
- I noticed that KubeWeekly’s previous edition below was uploaded to the web page. I said thank you to Saiyam Pathak, one of the editors of KubeWeekly at CNCF’s Ambassador, who checked for me.
○ 02/05/2021 — KubeWeekly # 249
○ 01/22/2021 — KubeWeekly # 247
○ 12/11/2020 — KubeWeekly # 243
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
CNCF End User Technology Radar, February 2021 — Secrets Management
- Information on Webinar with the title mentioned in “The Headlines”. This Webinar video is from “Secret Management”, I put this link again because it is difficult to reach the video from the link destination.
The Container Security checklist
Liz Rice @Aqua Security
- It outlines the checklist included in Liz Rice’s new Container Security book and details some potential weaknesses that you really need to avoid.
This Week in Cloud Native: Fluent Bit updates and Stream Processing
Fluent Bit
- I can’t find the lead from the above link to the video. I found it uploaded to YouTube.
- It explains performance updates in the latest release of Fluent Bit, long-awaited features such as multi-workers / new crypto libraries / GeoIP.
The Editorial
Articles, announcements, and more that give you a high-level overview of challenges and features.
The smallest Kubernetes Cluster: scaling down to the edge
Sascha Haase, Kubermatic
- It introduces the open source Kubermatic Kubernetes Platform built by Kubermatic.
- It keeps in mind use cases where edge computing unleashes millions of clusters, manages them, and raises questions about how to keep them in sync, even when the control plane and worker nodes are running in different locations.
Our take on internet access for virtual machines
Christian Bianchi, Giant Swarm
- It considered several ways to provide Internet access to Azure virtual machines and explained that he believes NAT gateways are the perfect balance between cost, flexibility, ease of deployment, and customer friendliness.
Introducing GKE Autopilot: a revolution in managed Kubernetes
Drew Bradstock, Group Product Manager, Google Cloud
- An article introduced after receiving the GA of GKE Autopilot. looks good.
- The following keywords of the video embedded in the page had an impact for me.
○ Fully automated Kubernetes platform
○ Google is your node SRE
VIDEO: Is Kubernetes right for us?
Keith Townsend and Alex Ellis
- Approximately one hour of discussion of the title content by Alex Ellis (Founder of OpenFaaS, CNCF Ambassador) and Keith Townsend (Co-Founder of The CTO Advisor).
- It was interesting to see the different perspectives of the open source creator and the CTO advisor. Hobby vs Business Endeavor, spikes due to COVID-19 influences and local resource exhaustion of cloud providers.
Upcoming CNCF Online Programs
Rethinking your company’s Cloud Security in the shadow of the SolarWinds attack
Amir Kaushansky and Leonid Sandler @ARMO
March 4, 2021
Register Now
This Week in Cloud Native (Livestream): Demystifying Kubernetes network policy
Thomas Graf @Isovalent
March 3, 2021 at 12:00 pm PT
Register Now
CNCF Online Programs Playlist on YouTube
Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.
- For more information, please visit our updated Online Programs page.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!