SRE / DevOps / Kubernetes Weekly Collection#22(Week 27)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #496 June 28th, 2020
SRE Weekly Issue #225 June 28th, 2020
KubeWeekly #223 July 2nd, 2020
DEVOPS WEEKLY ISSUE #496 June 28th, 2020
News
- The title is “The coming SMOKEstack: rethinking and retooling “multi-cloud””.
- An article by Redmonk (an industry analysis company focusing on developers) that analyzes the current state of hybrid cloud and multi-cloud and explains SMOKEstack.
- The following abbreviations for SMOKEstack are proposed by Mark Hinkle. Discovered and covered by the author who was exploring the “Serviceful” approach of the serverless community.
○ Serviceful
○ Mashable
○ Open
○ K(C)composable
○ Event-driven
A good introduction to Open Policy Agent, based on notes taken by a new user.
- The title is “First look at OPA (Open Policy Agent)”.
- An article that outlines OPA, demos, how to write policies, and how to use OPA.
- The title is “Why Doesn’t Your CI Pipeline Have Security Bug Testing?”
- “While clearly vitally important, current AppSec models are broken. The traditional approaches to application security prioritize training over tooling and finding over fixing. “, raising an issue and explaining the benefits of running security tests of the application in every build and how to get started.
- The title is “For the Love of Serverless: Brian Le Roux of Begin”.
- A blog series that has serverless leaders as guests. In the past, themes are “ thriving community “, “ rapid upward trajectory “ and “ heightened accessibility “.
- With Brian Le Roux (CTO and Co-Founder at Begin) as a guest, the article touches on the industry’s competitive landscape, opportunities to improve onboarding , and why we’re excited about the new experimental JavaScript runtime.
- The title is “Stack History: A Timeline of Slack’s Tech Stack Evolution.”
- As the title says, this article describes the evolution of Slack’s technology stack in chronological order. I’ve never used StackShare, so I created an account. It would be nice to see the technology stack of each company and ideas for each tool.
A good outstage report and investigation into a Casandra cluster issue caused by counter columns.
- The title is “Cassandra counter columns: nice in theory, hazardous in practice”.
- An article summarizing precautions for using Cassandra Counters on a large scale in a production environment through obstacles and countermeasures in “Ably Realtime “.
- I love this closing phrase, “Unlike the priestess Cassandra of greek myth, Ably Realtime is not in the business of making prophecies, whether they are believed or not. Just that expecting practice to always live up to theory can be hazardous. Oh, and don’t use Cassandra counter columns. Not even once.” May Apollon’s anger not be in this Cassandra.
- The title is “Nigel Poulton on How Kubernetes Can Make or Break the Devops Workflow”.
- Semaphore podcast. With Nigel Poulton, the author of The Kubernetes Book: Updated Feb 2020 Kindle Edition , as a guest guest, the theme is “How Kubernetes Can Make or Break the Devops Workflow”.
- He likes Kubernetes himself, and talks from the point of view of the one mainly teaching Kubernetes to the people.
○ “Kubernetes is a monster of complexity”
○ “I see people struggling with it all the time, and almost being forced into deploying to Kubernetes without having enough sort of knowledge. A lot of people deploying to Kubernetes are deploying with just enough Kubernetes knowledge, and that worries me.”
○ “I feel like Kubernetes is not for everyone, but it’s been marketed as if it is for everyone”
- The title is “A Methodology for Penetration Testing Docker Systems”.
- Joren Vrancken’s bachelor’s thesis explains how to perform penetration tests on Docker-based systems. Docker’s prerequisite knowledge is well organized and good.
Jobs
King is looking for new members for the infrastructure engineering teams to help develop, manage and expand our software based networking setup across datacenters and (Google) cloud. Please take a look at the open role for networking engineer. We’re also still looking for both database and streaming data engineers, if that is more your style.
- Continued job information from King(at that moment). There seemed to be no fluctuation in the post. It seemed that they were looking for SRE , Database SRE , Network SRE.
Tools
- The GitHub page of the OSS tool “awsls” for commands that list AWS resources. It covers 76 types of AWS resources and 200 types of resources.
- A GitHub page of OSS of CLI tools,”Konstraint” to support the creation and management of restrictions on the use of” “Gatekeeper”.
- Web page of the new OSS testing tool “KUTTL” for Kubernetes clusters.
- Click here for the GitHub page.
SRE Weekly Issue #225 June 28th, 2020
Articles
Catchpoint’s SRE Report 2020 — The Highlights
This suggests an upcoming shift in our field:
50 percent of SREs believe they will be working remotely post COVID-19, as compared to only 20 percent prior to the pandemic.
Kameerath Kareem — Catchpoint
BONUS CONTENT: An outside take on the survey results is here (Mike Vizard — DevOps.com).
- An article that introduces “Catchpoint’s SRE Report 2020” and highlights the points.
- It says that “According to Google, there should be an upper bound goal of 50% ops work and 50% dev, but this 50/50 split may just be a pipe dream. Based on the survey results, most of the SRE work is dominated by operations-type activities. “.
No one person can (or should) know everything. How do we allocate expertise and build connections in order to maximize resilience and adaptive capacity?
Will Gallego
- An article that left behind some thoughts and questions, such as the standardization of fault response that is related to software engineering, from the event when I asked a handyman for repair due to a trouble around the water.
Heroku Incident Folow-up: Incident #2038
A new feature was accidentally rolled out to too wide an audience, causing log message loss.
Heroku
- Follow-up information for the issue that occurred on Heroku from 02/06/2020 14:34 UTC to 06/20/2020 18:42 UTC.
The impact of slow NFS on data systems
[…] one slow block device can affect the performance of processes even when those processes don’t use the slow block device.
Kalyanasundaram Somasundaram — LinkedIn
- In the Engineer’s blog at LinkedIn, they found and solved a problem with Espresso, LinkedIn’s de facto NoSQL database solution.
- They clarified the behavior of shared page caches between block devices and discovered how one slow block device can affect the performance of a process even when the slow block device is not in use.
SRE error budgets and maintenance windows
Should you count scheduled maintenance against your error budget? It depends.
Jesus Climent — Google
- On the Google Cloud web page, an article with the theme “How maintenance windows affect the error budget-SRE tips”.
Cassandra counter columns: nice in theory, hazardous in practice
An investigation in response to three incidents led to this stark conclusion about Cassandra’s “counter columns” feature:
In fact, they don’t appear to have any properties that make them a useful primitive for building predictable distributed systems.
Paddy Byers — Ably
- I will omit it because it is covered in DevOps Weekly above.
How to Be a Financially Conscious Site Reliability Engineer
This article explains why we should have cost data at our fingertips as we design cloud-based systems.
[…] a well-architected system is often a cost-efficient system.
CloudZero
- An article that describes how to include some of your cloud costs as part of your day-to-day efforts within your SRE’s scope of responsibility and, as a result, realize many benefits (not just controlling costs).
- Since it is an element that you should be aware of in your current work, check the following linked articles.
○ 「Make These Three Architectural Changes to Optimize Cloud Costs 」
○ 「3 Things Finance Teams Should Understand About AWS (Straight from Engineering)」
○ 「How to Empower DevOps to Make Better Cloud Cost Decisions」
○ 「Cloud Management and Optimization: How to Do it Right」
A Shared Pilot-Autopilot Control Architecture for Resilient Flight
This is a new concept to me, and I really like it:
Capacity for maneuver (CfM) is a measure of how much adaptability or room to respond to a new challenge that a given part of the system has, whether a person or autonomous agent.
Amir B. Farjadian, Benjamin Thomsen, Anuradha M. Annaswamy, and David D. Woods (original paper) Thai Wood — Resilience Roundup (summary)
- An article that explains the essence of the title paper from the perspective of resilience engineering.
Outages
An update to our nameservers has been rolled back. We are monitoring recovery.
- IBM Cloud
○ I saw several mentions of this outage in the media but IBM’s status page doesn’t seem to list it. - Fastly
- Reddit
○ and this one
KubeWeekly #223 July 2nd
The Headlines
Editor’s pick of the highlights from the past week.
KubeCon + CloudNativeCon North America is Going Virtual + CFP extended!
We have some news — KubeCon + CloudNativeCon North America is going online!
Connecting and collaborating is in the DNA of the open source cloud native community. With over 92,000 contributors to CNCF projects, 11 new sandbox projects, and 30 new members in Q2 (bringing membership to 570 organizations!), KubeCon + CloudNativeCon has been a key place to keep conversations going and continue building cloud native’s momentum. Although we can’t wait for the day we’ll be able to get together in-person, our virtual events are essential in educating and keeping our community thriving.
This also means the CFP has been extended! The deadline to submit a talk is Sunday, July 12 at 11:59 pm PDT. Learn more about the CFP and other event details here. We look forward to bringing the community together soon!
- An article by CNCF reporting that KubeCon + CloudNativeCon North America was held virtually and that the CFP deadline was extended(at that moment).
- It described the virtual event as indispensable for the education and prosperity of the community, and expresses the unwillingness to gather and hold the members of the community.
KubeCon + CloudNativeCon EU Virtual Session Spotlight
The countdown to KubeCon + CloudNativeCon EU Virtual on August 17–20, 2020 is on! As we approach the event, we curated a few recommended sessions that we don’t want you to miss. Please see the feature for this week and be sure to register today!
The Beginners Guide to the CNCF TOC
Presented by: Liz Rice, VP of Open Source Engineering at Aqua Security
Who is the Technical Oversight Committee? What do its members do? How do projects get picked for adoption into the CNCF? Let’s shine a light on this group who determine which projects are adopted by the CNCF, set the future direction of the cloud native landscape, and are even responsible for the definition of the term “cloud native.”
This talk discusses the pros & cons of a project’s participation in the CNCF from the perspective of end users, vendors, contributors, and maintainers. It covers the lifecycle for a CNCF project, including:
– why projects want to be in the CNCF — how the project adoption process works — the requirements that the CNCF has on projects at different phases of maturity
Attendees will leave this talk with insights into how the technical arm of the CNCF works, why it’s important, what the TOC wants to do next, and how they can get involved.
- KubeCon + CloudNativeCon EU Virtual highlights the “The Beginners Guide to the CNCF TOC” session. Schedule: Tuesday, August 18 16:57–17:13 CEST (Central European Summer Time).
- Liz Rice (VP of Open Source Engineering at Aqua Security/CNCF’s TOC chair) will introduce the TOF (Technical Oversight Committee) of CNCF, and will explain the good pros and cons of participating in CNCF from various viewpoints.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
You can view all CNCF recorded and upcoming webinars here.
CNCF Ambassador Webinar: Commoditise Kubernetes with cluster-api
Gianluca Arbezzano, Senior Staff Software Engineer @Packet
- It contains a demonstration and explanation using cluster-api on bare metal.
- About 20 minutes without a slide, the presenter explains the mechanism and background of Kubernetes so far.
CNCF Member Webinar: Best Practices for Running and Implementing Kubernetes
Kendall Miller, President @Fairwinds and Robert Brenna, Director of Open Source @Fairwinds
- It explains the considerations when using Kubernetes and common pitfalls.
- I thought it was nice for the two people to proceed along with the slides while explaining/supplementing. The slide was a style where questions/wonders are listed and ideas/best practices were replied.
CNCF Member Webinar: 7 Critical Reasons for Kubernetes-Native Backup
Deepika Dixit, Member of Technical Staff @Kasten and Mark Severson, Member of Technical Staff @Kasten
- It contained demos of the CNCF project (Kubernetes, kind, CSI), and explained a cloud-native backup strategy, and its benefits.
CNCF Member Webinar: Pivoting Your Pipeline from Legacy to Cloud Native
Nathan Martin, CEO @Sagecore Technologies and Tracy Ragan, CEO @DeployHub
- It explains how the approach needed to be converted to a service-based approach and how to deal with it.
- The title of the slide was “Pivoting Your Pipeline for Microservices”
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
ConfigMaps in Kubernetes: how they work and what you should remember
Flant staff
- An article explaining how ConfigMap works with Kubernetes and what you need to remember. It said in the beginning that “Please note that this is not a complete guide, but rather a reminder/tips collection for those who already use ConfigMap in Kubernetes or are in the middle of preparing their applications to use it.”.
Docker and Kubernetes — root vs. privileged
Bryant Hagadorn
- An article that compares/explains the root authority operation on UNIX-based MacOS and Linux and the Docker — privileged flag.
Verify your Kubernetes Cluster Network Policies: From Faith to Proof
Jan Harrie
- An article that sets Kubernetes network policy, considers how to test the validity of the setting, and implements/explains it.
Introducing Frigate: A documentation generation tool for Kubernetes Helm Charts
Jacob Tomlinson
- An article that introduces Frigate, a tool that automatically generates documentation for Helm charts.
User-defined Webhooks in Puppet Relay with Knative and Ambassador API Gateway
Noah Fontes, Puppet
- An article on Ambassador’s blog explaining how to set up a user-defined webhook for Puppet Relay with Knative and Ambassador API Gateway.
The Building Blocks of DX: K8s Evolution from CLI to GitOps
Katie Gamanji, American Express
- It focused on the evolution of Cluster DX over time, we are introducing tools that contributed to the expansion of Kubernetes adoption.
This operator deletes stale feature branches in a Kubernetes cluster.
- Operator’s GitHub page that removes the old feature branch in the Kubernetes cluster.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Mirantis and Docker Enterprise, with Adrian Ionel
Craig Box and Adam Glick, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. The current co-hosts are Craig Box and Adam Glick.
- Adrian Ionel (Co-founder and CEO) of Mirantis was welcomed as a guest.
- We talk about the introduction of OpenStack to engineering plastics (NASA example, etc.), Kubernetes community from the experience of OpenStack, and acquisition of Docker Enterprise in the flow from the establishment of Mirantis.
- The host side prepared the questions in a nice way and it was smooth.
- What have you learned from your experience with OpenStack? “Robustness and simplicity is very important, that’s the key lesson we learnt.” I’m also interested in the Airship project.
- The topics of interest in News of the week are: There was a lot of news already covered in this article.
○ ACI and Docker integration now public
○ gRPC-Web for.NET now GA
○ Episode 94, with Richard Belleville
Introducing the Hewlett Packard Enterprise Ezmeral software portfolio
Kumar Sreekanti, Hewlett Packard Enterprise
- An article introducing the “Hewlett Packard Enterprise Ezmeral software portfolio” on the page of HPE.
- “Ezmeral” is derived from Spanish and means “emerald”. “Emeralds have the mysterious power to strengthen intelligence, anticipate future events, relieve stress and boost immunity,” as well as an image of helping customers with AI and data-driven innovation.
Building Cloud-Scale DBaaS with Kubernetes Operators
Benjamin Anderson, IBM Cloud Databases
- IBM blog. IBM Cloud runs several Database-as-a-Service (DBaaS) products directly on top of Kubernetes, building a control plane based on the Operator pattern.
- An article from the history of stateful services to understand this approach, its motivation, and its implications.
I Found A Painless Way To Manage Secrets In Google Kubernetes Engine
Merlin, Hacker Noon
- An article that explains how to manage GKE’s Secret using the OSS tool Berglas.
Optimize the Kubernetes Developer Experience with Version 0
Richard Li, Ambassador
- An article that points out that microservices may not work well and introduces a “ Version 0 Strategy” that helps integrate developer experience into an organization’s development workflow.
How Microservices facilitate Feature Teams’ work
Mia-Platform Team
- An article that describes what a “Feature Team” is and how microservices can facilitate that work.
- There are various approaches to assigning names and roles to teams, but I think it is important to give teams a clear role/viewpoint that crosses functions/services/organizations that tend to be personal.
Kubernetes static code analysis with Checkov
Jon Jozwiak, Bridgecrew
- An article introducing “Checkov”, which is an OSS infrastructure analysis tool. Scan the Kubernetes manifest to identify security and configuration issues for Kubernetes workloads.
- Covers infrastructure security scans as code for Terraform and CloudFormation for AWS, Azure, GCP, catches misconfigurations and helps maintain cloud security best practices.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Member Webinar: Stay on top of ongoing Kubernetes security hygiene
Zohar Kaufman, Co-Founder and VP R&D @Portshift.io
Ariel Shuper, VP Product @Portshift.io
July 2, 2020 10:00 AM Pacific Time
Member Webinar: Optimize your Kubernetes Clusters on Azure with Built-in Best Practices
Jorge Palma, Senior Program Manager @Microsoft
July 7, 2020 10:00 AM Pacific Time
Member Webinar: The Challenges and Countermeasures of Service Mesh Practice
裴斐 (Fei Pei), 网易 杭州研究院 云计算技术专家、架构师 @网易
This webinar will be delivered in Chinese.
July 8, 2020 10:00 AM China Standard Time
Project Webinar: What’s new in Linkerd 2.8 : Multi-cluster Kubernetes made simple and secure by default
Oliver Gould, Linkerd Project Lead, co-founder & CTO @Buoyant
July 8, 2020 10:00 AM Pacific Time
Member Webinar: Building Production-ready Services with Kubernetes and Serverless Architectures
Mike Metral, Software Architect and Engineer @Pulumi
Jason (Jay) Smith, App Modernization Specialist @Google Cloud
July 8, 2020 1:00 PM Pacific Time
Member Webinar: 如何落地 Service Mesh — 从技术选型到实践
马若飞 FreeWheel 北京研发中心首席工程师 @FreeWheel
This webinar will be delivered in Chinese.
July 9, 2020 10:00 AM China Standard Time
Member Webinar: The top 10 most-useful Kubernetes APIs for comprehensive cloud-native observability
Caleb Hailey, Co-founder and CEO @Sensu
July 9, 2020 10:00 AM Pacific Time
Member Webinar: Securing and Accelerating the Kubernetes CNI Data Plane with Project Antrea and NVIDIA Mellanox ConnectX SmartNICs
Antonin Bas, Maintainer of Project Antrea and Staff Engineer @VMware
Moshe Levi, Sr. Staff Engineer @NVIDIA
July 14, 2020 10:00 AM Pacific Time
Member Webinar: Serving Millions of Customers with Cloud Native and DevSecOps
Chris Hollies, CTO, Oracle Practice @Capgemini
Akshai Parthasarathy, Principal Director, Cloud Native and DevOps @Oracle Cloud
July 15, 2020 7:00 AM Pacific Time
Member Webinar: Advancing image security and compliance through Container Image Encryption!
Brandon Lum, Senior Software Engineer @IBM
July 15, 2020 10:00 AM Pacific Time
Member Webinar: Kubernetes and storage. Kubernetes for storage. An overview.
Kiran Mova, Chief Architect at MayaData and core maintainer of OpenEBS @MayaData
July 16, 2020 10:00 AM Pacific Time
Member Webinar: Learn how to clean up your cloud-native “DevOps Dumping Ground”
Melissa Sussmann, Product Marketing Lead @Puppet
Kenaz Kwa Principal Product Manager @Puppet
July 17, 2020 10:00 AM Pacific Time
Member Webinar: Kubernetes Security Anatomy and the Recently Disclosed CVEs
Gadi Naor, CTO & Co-Founder @Alcide
July 21, 2020 10:00 AM Pacific Time
Member Webinar: Implementing Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Oleg Chunikhin, CTO @Kublr
July 22, 2020 1:00 PM Pacific Time
Member Webinar: Observability of multi-party computation with OpenTelemetry
Antoine Toulme, Engineering Manager @Splunk
Dave McAllister, Sr. Technical Evangelist @Splunk
July 23, 2020 10:00 AM Pacific Time
Member Webinar: Kubernetes Policies 101
Eran Leib, Founder, VP Product Management @Apolicy
Spenser Paul, Director of Sales, North America @DoiT International
July 28, 2020 10:00 AM Pacific Time
Member Webinar: Cluster API — Yesterday, Today, Tomorrow
Saad Malik CTO & Co-Founder @Spectro Cloud
Jun Zhou Chief Architect @Spectro Cloud
July 30, 2020 10:00 AM Pacific Time
Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code
TiKV team
July 31, 2020 10:00 AM Pacific Time
Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code
TiKV team
July 31, 2020 10:00 AM Pacific Time
Project Webinar: Kubernetes 1.19
Kubernetes release team
Aug 28, 2020 10:00 AM Pacific Time
Member Webinar: Getting started with container runtime security using Falco
Loris Degioanni, CTO and Founder @Sysdig
Sept 2, 2020 1:00 PM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!