SRE / DevOps / Kubernetes Weekly Collection#85(Week 37, 2021)
- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #559 September 12th, 2021
SRE Weekly Issue #287 September 12th, 2021
KubeWeekly #277 September 17th, 2021
DEVOPS WEEKLY ISSUE #559 September 12th, 2021
News
- The title is “(Over) Pay As You Go for Your Data store”.
- It outlines the pitfalls of “pay-as-you-go” and the guidelines they have come up with to design their “next gen” data store solution.
- The title is “ttl.sh and cosign: Signing an anonymous & ephemeral Docker image registry.”
- It explains the contents in the above title and the comment of the Editor.
- The title is “NSA & CISA Kubernetes Security Guidance — A Critical Review”.
- The guidance contained in the Cybersecurity Technical Report (CTR) above is explained in three points: “The Good,” “The Bad,” and “The Complex.”
- The title is “Restricting Docker Access With a Reverse Proxy”.
- As the title above and the comments of the Editor, it explains how to filter the path of access to Docker by a reverse proxy using “Caddy”.
- The title is “Security + Observability = Compliance”.
- It briefly explains the concept of the title that the author thinks.
- The title is “How We Design Our APIs at Slack”.
- It describes the API design principles and the new API specification, review, and testing process.
- There are six “Our design principles” below.
- Do one thing and do it well
- Make it fast and easy to get started
- Strive for intuitive consistency
- Return meaningful errors
- Design for scale and performance
- Avoid breaking changes
- There are four “Design processes” below.
- Write an API spec
- Internal API review
- Early partner feedback
- Beta testing
Tools
- The GitHub page of “SLO-Tracker’’ provides a simple way to track SLOs and error budgets. SLO-tracker can be integrated with a few alerting tools via webhook integration to receive SLO violating incidents.
- Click here for the web page. Click here for a blog post introducing SLO-Tracker.
- The GitHub page of “Amazon EKS Anywhere (EKS-A)” which became GA. A CLI tool that extends the consistent cluster management experience with Amazon EKS (eksctl) to your on-premises Kubernetes cluster.
It’s better to understand that the name is similar to ECS Anywhere but it is a completely different concept, and EKS Distro (EKS-D) and Amazon EKS Connector are also included
SRE Weekly Issue #287 September 12th, 2021
Articles
Industry Interviews: Colm Doyle, Incident Commander at Slack
Lots of details about how Slack does incident response in this one.
Stephen Whitworth — incident.io
- As the title suggests, it details how it became an Incident Commander (IC) at Slack, how to handle it, and the first 5 minutes after getting paged.
Five Ways Developers Can Help SREs
This list also gives an interesting insight into the way this company does SRE.
Mayank Gupta and Merlyn Shelley — Squadcast
- As the title suggests, it lists the following five best practices that developers can adopt to make SRE work easier.
- Scaling The Platform With The Concept Of A 12-factor App Method
- Sharing Performance Testing Data Insights
- Significance of Documentation and Configuration files
- AIOps Supported System Admin Functionalities
- Increasing Observability Of The System
Oh BGP, you rascally little routing protocol.
Alessandro Improta and Luca Sani — Catchpoint
- The network failure “an outage hit Spectrum cable customers in the Midwest” caused by BGP network public relations control that occurred on September 7, 2021, 16:36 UTC was analyzed and commented from the viewpoint in the title.
A comprehensive definition of SREs and Site Reliability Engineering, including what SREs do and what makes SREs different from other roles.
The article covers various facets of SRE and acknowledges that SREs can perform many roles.
JJ Tang — Rootly
- It addresses questions about technical roles and positions and other questions to provide a complete definition of SRE. It also provides tips on what SRE actually does and how to help the SRE in your organization be the best they can be.
The Atlantic GLIDER, Air Transat flight 236! Explained by Mentour Pilot
Another really excellent air accident story with lots of great talk about mental models and confirmation bias. The crew saw lots of disparate indications that each didn’t point to anything in particular and each wasn’t a huge problem on its own. That, coupled with confirmation bias, helped them miss what might seem obvious in hindsight.
Mentor pilot
- A YouTube video that explains one of the the most famous aviation accidents, “Air Transat flight 236”, taking up the safety recommendations with the background to the incident, how to deal with the crew, and the final report is here.
KubeWeekly #277 September 17th, 2021
The Headlines
Editor’s pick of the highlights from the past week.
Congratulations to Envoy on the 5 year anniversary of the project!
Matt Klein, Envoy
Congratulations to Envoy on their fifth anniversary of the project! Hear from Matt Klein (the project creator) on Envoy’s brief prehistory and history of the project, along with some of the lessons learned along the way.
- As mentioned above, the project creator Matt Klein said to commemorate the 5th anniversary of the Envoy project. It talks about the lessons it has learned over time as the large-scale OSS project grows.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Kata and Arm, a secure alternative in the 5G space
Kiel Faller, Arm
- A approximately 45-minute session demonstrates the 5G O-RAN components on the Arm infrastructure and their importance in 5G space, and discusses the potential impact of using open source components, their cost savings and increased customizability.
Building an HA control plane for Tinkerbell with Kube-vip
Jason DeTiberus, Equinix
- An approximately 1-hour session that checks for updates to the Tinkerbell project, explains how the control plane was built, and the role that kube-vip plays.
Moving from CLIs to control planes with Crossplane
Viktor Farcic, Upbound
- A approximately 30-minute session explaining the benefits of managing infrastructure, services, and apps using the Universal Control Plane(Crossplane).
Using CSI snapshots to backup and restore your data in Kubernetes
Michael Courcy, Kasten by Veeam
- A 20-minute session explaining the CSI snapshot feature and how it fits into the Kubernetes storage architecture.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
NSA & CISA Kubernetes security guidance — A critical review
Iain Smart, NCC Group
- Since it is taken up in the above DEVOPS WEEKLY ISSUE #559, I will skip it.
Top 9 file integrity monitoring (FIM) best practices
Alejandro Villanueva, Sysdig
- As the title suggests, it describes four types of FIM (File Integrity Monitoring) focusing on host and container security, and the following nine best practices.
- Prepare an asset inventory
1: Scope which files and directories need to be monitored
2: Define appropriate permissions
3: Define a baseline - Detect drift
4: Shift left with image scanning policies
5: Detect real-time threats with runtime policies - Notify, investigate, and respond
6: Implement an automated alert and response mechanism
7: Gather forensics data for further investigation - Compliance and Benchmarks
8: Stick to compliance requirements
9: Run automated benchmarks
DataRoaster is now open-sourced, why I created it
Kidong Lee, ITNEXT
- It introduces the open source of “DataRoaster”, which provides a data platform that runs on kubernetes.
- Click here for a 12-minute demo video of DataRoaster.
Why data scientists shouldn’t need to know Kubernetes
Chip Huyen
- As the title suggests, it’s good for data scientists to take on the entire tech stack, but instead of retrieving YAML files, you can take advantage of good infrastructure abstraction tools that allow you to focus on real-world data science without knowing Kubernetes.
Solving API authorization challenges in multi-cloud environments
Nima Moghadam, Kong
- It explains using figures and codes along with the title. The bottom line is that the use of OPA and declarative policies has become very popular, especially in API Ops, for the following reasons:
- Easy to integrate
- Declarative
- Extremely powerful and flexible
- Platform agnostic
Rate limiting with the HAProxy Kubernetes Ingress Controller
Jim O’Connell, HAProxy
- This article describes how to use the overall rate limit to mitigate the effects of events such as DDoS.
- However, HAProxy Kubernetes Ingress Controller offers even more fine-grained control to fend off DDoS attacks using several annotations that can help you build a powerful first line of defense on an IP-by-IP basis.
Deploy OpenFaaS to Linode with K3sup
Alex Ellis, OpenFaas
- As the title suggests, the following points explain how to deploy OpenFaaS to Linode using a virtual machine and K3sup.
○ Introduction
○ Tutorial
○ Create an account on Linode
○ Create a VM on Linode
○ Pre-reqs
○ Install K3s using K3sup
○ Install OpenFaaS
○ Configure an Ingress Controller and TLS certificate
○ Wrapping up
○ Getting in touch and supporting our work
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Prodfiler, with Thomas Dullien
Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. This time the Host is Craig Box and Guest Host Jimmy Moore.
- They have Thomas Dullien , the co-creator of “Prodfiler”, as a guest.
- The topics I was interested in in the News of the week are as follows.
○ Backup for GKE
○ Kubernetes multi-cluster panel on October 6
○ Subsidiary Kubernetes Grid 1.4
Why we created the Prometheus Conformance Program
Richard Hartmann, Grafana Labs
- As the title suggests, it introduces the reasons for creating the Prometheus Conformance Program.
- Learn more about the Conformance Program design, available test suites, current test results, and how to apply for the official Prometheus compatibility mark in the following 10/14 session at KubeCon + CloudNativeCon NA.
○ The Prometheus Conformance Program — Richard Hartmann, Grafana Labs
Crossplane is now a CNCF incubating project
Jared Watts, Crossplane blog
- As the title suggests, Crossplane reported that the maturity level was promoted from the CNCF sandbox to incubation, looking back on the following points and mentioning about the future.
○ A Consistent Vision
○ The Community Keeps Growing
○ First Major Milestone Ready for Production
○ Strong Partnerships with the Ecosystem
○ Production Adoption
○ Conformance in the Ecosystem
○ The Road Ahead
Google’s Sqlcommenter now extending the vision of OpenTelemetry to databases
Nimesh Bhagat, Google Cloud
- Since it is covered it in last week’s Kube Weekly #277 , so I will skip it.
Cloud Native Chaos and Telcos — Enforcing reliability and availability for telcos
W.Watson, Vulk Coop & Karthik S., LitmusChaos
- The explanation is based on the keywords in the title. The conclusion is below.
○ Borrowing from the lessons learned when applying chaos testing to cloud native environments, we should use declarative chaos specifications to test telecommunication infrastructure in tandem with its development and deployment. The CI/CD tradition of “pull the pain forward” with a focus on MTTR will produce the type of highly available and reliable systems that cloud native telecommunication systems will need to be.
7 microservices best practices for developers
Michael Bogan, Kong
- The following 7 points are explained along with the title.
- Small Application Domain
- Separation of Data Storage
- Communication Channels
- Compatibility
- Orchestrating Microservices
- Microservices Security
- Metrics and Monitoring
NSA & CISA Kubernetes security guidance
Lars Larsson, Elastisys
- It summarizes the main takeaway messages of Kubernetes Hardening Guidance and provides additional insights based on its personal experience with cloud security.
KubeCon + CloudNativeCon North America preview with Constance Caramanolis and Stephen Augustus
The CUBE
- As the title suggests, a 21-minute session in which two Co-chairs from KubeCon + CloudNativeCon North America are interviewed for the event and talk about the highlights.
Introducing the CNCF End User Journey Report: First up, Spotify
CNCF
- The CNCF End User Community has published the first report, “End User Journey report features Spotify” and outlines in this article.
- The End User Journey report focuses on active end user community members. It shows how these organizations have grown as technology leaders and have benefited from joining the CNCF end-user community.
Upcoming CNCF Online Programs
*edited as the Kubernetes 1.22 release webinar has been rescheduled
Live Webinar
- September 21 at 10am PT: Introduction to APIClarity — A Wireshark for APIs presented by Zohar Kaufman & Alexei Kravtsov, Cisco — RSVP
Cloud Native Live
- September 22 at 9am PT: Optimizing and securing Kubernetes workloads with Polaris and Goldilocks presented by Andy Suderman, Fairwinds — RSVP
On-demand
- September 23: Kong Ingress controller — Kubernetes Ingress on steroids presented by Viktor Gamov, Kong — RSVP
- September 23: Enable stateful applications on AWS with persistent storage for Kubernetes presented by Ananth Vaidyanathan, AWS — RSVP
CNCF End User Lounge Livestream
- September 23 at 9am PT : Operationalizing 300+ K8 clusters across the cloud presented by Niraj Amin, Rajarajan Pudupatti SJ, & David Botelho, Fidelity — RSVP
Looking for more great curated content? Visit our Online Programs playlist on YouTube.
Learn more about CNCF Online Programs
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!