SRE / DevOps / Kubernetes Weekly Collection#7(Week 12)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #481 March 15th,
2020SRE Weekly Issue #211 March 15th,
2020KubeWeekly #208: March 20th, 2020
DEVOPS WEEKLY ISSUE #481 March 15th, 2020
News
- The title is “How cloud transforms IT security into AppSec”.
- Based on the view that “the cloud has made the infrastructure part of the app,” he describes “before the cloud” and “after the cloud” as the changes have provided security practices.
- The title is “O’Reilly serverless survey 2019: Concerns, what works, and what to expect”.
- November 2019 article. O’Reilly conducted its first “serverless adoption” survey, with high-level responses from over 1,500 broad range (location, company, industry) respondents. Surveys and results with clear assumptions, targets, and objectives are interesting and informative.
- The title is “Decentralized Calico Network Security Policy Deployment for GitOps — Part 2.”
- Part 2 of Tigera’s GitOps blog series. Click here for Part 1 “Enforcing Network Security Policies with Git Ops”. It seems that Part 3, the final part of the trilogy, hasn’t appeared yet as of March 20th(Fri).
- From defining the scope first (how to enable distributed policy workflows) and then defining the challenges (policy creation complexity, governance checks) related to applying network security policies in Kubernetes using GitOps. First, we are creating an end-to-end policy workflow as an example.
- The title is “An Overview of the Azure CNAB Quickstarts Library”.
- Howard van Rooijen , the author, shares the inspirational results obtained through CNAB (Cloud-Native Application Bundle) and Porter at endjin, and introduces an overview of Azure CNAB, an intro video of Azure CNAB Quickstarts Library, and an intro article .
- The title is “Interoperability of open-source tools: the emergence of interfaces”.
- From the perspective that “OSS interoperability supports scalability and innovation” in the development of Kubernetes, Container Runtime Interface (CRI), Container Network Interface (CNI), Container Storage Interface (CSI), Service Mesh Interface ( SMI), Cluster API and their respective background and features.
- The title is “BUILDING GO APPLICATIONS WITH BAZEL”.
- “Dependency management and binary building are probably the most frustrating and least satisfying part of software development, and these frustrations get worse as the application grows.” May 2018 article starting from.
- It mentions that various tools have been developed and open sourced to solve this complexity, the blaze build tool used internally by Google, Facebook’s buck as a derivative tool from it , pants of Foursquare And introduces what this main bazel brings, settings in Go language, application creation/testing, etc.
A look at CIlium Cluster Mesh, including a deep-dive into eBPF and CNI networking stacks.
- The title is “Kubernetes Multi-Cluster Networking -Cilium Cluster Mesh”.
- “In a dynamically changing and very complex ecosystem of microservices, traditional IP address and port management tends to cause problems from a management and scale perspective,” said Cilium, BPF (Berkeley Packet Filter). Introducing YugabyteDB.
- Not only the image diagram, but also the display contents when hands-on are carefully explained, so it looks good.
Tools
- The web page of the “Monitor” tool that monitors the server status and visualizes it in one screen. There is also a live demo where you can see the screen display .
- Click here for the GitHub page .
SRE Weekly Issue #211 March 16th, 2020
Articles
SRECon20 Asia/Pacific is rescheduled to September 7–9, 2020.
- Information about SRECon20 Asia/Pacific’s scheduled event in Sydney, Australia.
- Announcement and registration of all programs would begin in May(at that moment).
Business continuity at Slack: Keeping our customers up and running during COVID-19
This article has a definite marketing slant. It’s nonetheless interesting to see how Slack is handling the situation.
Cal Henderson and Robby Kwok, Slack
- Slack’s COVID-19 (new coronavirus infection) BCP (business continuity plan) announcement.
- It supports multiple languages, and Japanese is here .
- They said that we are continuing our business with the BCP and pandemic plan. Even if all employees move to remote work, there is no hindrance to the work, and even if the capacity and load increase, they can respond sufficiently, so no worries.
Journey into Observability: Glitch’s journey
I love this gem:
I’m not surprised [that] companies that are far into their observability journey start advocating for testing in production — once you have the data and you can slice & dice it as you see fit, testing in production seems like a totally reasonable thing to do.
Mads Hartmann
- This article focuses on why Glitch started investing in tools to ensure observability, the current situation and how it got there, and finally what remains to be done.
Lessons in Distributed Communication From Incident Response
With many companies suddenly shifting into figuring out how to become distributed organizations overnight, we can learn many lessons by looking at incident response patterns.
George Miranda — PagerDuty
- In response to COVID-19 (new coronavirus infectious disease), the transition to remote work is rapidly progressing, but in the past on-site on-site operation to cloud distributed operation, corona correspondence and PagerDuty’s system and The explanation is given while referring to the correspondence.
- GitLab’s “ Communication Practices “, post-disaster response “Non-blame Postmortem”, PagerDuty’s blog “ Effective Remote Work “ etc. are introduced.
When correlation (or lack of it) can be causation
Today’s post is a double header. I’ve chosen two papers from NSDI’20 that are both about correlation.
Paper #1 is a tool that helps identify when files A and B are often changed at the same time, and warns you if you forgot B. Mehta et al. — NSDI’20 (original paper #1)
Paper #2 is a tool for finding correlated failure risks that threaten reliability.
Zhai et al. — NSDI’20 (original paper #2)
Adrian Colyer — The Morning Paper (summaries)
- A series that randomly looks at CS research by Adrian Colyer.
- This time, we are taking up two “correlation” announcements from NSDI ’20 (SANTA CLARA, CA from February 25 to February 27) hosted by USENIX .
- The first title is “ Rex: Preventing Bugs and Misconfiguration in Large Services Using Correlated Change Analysis “. Click here for PDF .
- The second title is “ Check before You Change: Preventing Correlated Failures in Service Updates “. Click here for PDF . There is also a slide here .
- It introduces a method to prevent bugs and config mistakes when making correlated changes using two tools, Rex and Cloud Canary.
Great Incident Response Requires 3 Major Components
The components from the article are:
Ability to recognize how bad the situation really is, and prioritize it
Effective communication skills
Compassionate responses to mistakes and a learning mindset
Hannah Culver — Blameless
- As remote work becomes more familiar and distributed teams become the norm, troubleshooting becomes more tricky.
- The following three elements are explained as necessary elements for dealing with disabilities regardless of where they work.
We’re pleased to announce Failover Conf, a conference focused on building resilient systems. The conference will be held online on April 21 and session submissions will be accepted through March 23.
CFP open through March 23.
Gremlin
- Information that the Failover Conf will be held online on April 21, and that CFP will be accepted until March 23. As an unexpected situation happened to be planned offline, as the name implies, it was decided to hold it online as a failover.
- I feel the pride of the author as a practitioner in the word “Practicing resilience”.
Grow your blame-free culture with these postmortem best practices | FireHydrant
There are some good tips in here, especially if you’re new to this.
Mandy Mak
- A proposal to disseminate “a culture without blame” in one’s organization through three post-mortem best practices.
1. The “blameless post-mortem” focuses on learning
→ allowing engineers to respond with better information.
2. Make efforts in groups
→ Collect various viewpoints asynchronously. Give voice to members outside the context of disability response, such as the CS team, and include information such as what the customer has reported. Keep everyone informed, including new members.
3. Tolerate mistakes.
→ Keep in mind that “all members tried their best at the time and tried to make the best choice, regardless of the size of their disability”.
→ Postmortem encourages honesty and transparency and works to ensure psychological safety.
How network automation helps Fastly support the world’s biggest live-streaming moments
Fastly’s APS tool (Auto Peer Slasher) detects when a link is nearing saturation and automatically reroutes traffic through a different interface.
Ryan Landry — Fastly
Full disclosure: Fastly is my employer.
- A story about how Fastly’s network automation and team of experts are supporting live streaming of Superball, the big event that generated the most traffic in the US in a year.
- They did everything. They conducted direct peering with many domestic ISPs so that communication could be made as close to the end users as possible to the companies that were interconnected.
Outages
- US Dept. of Health site and COVID-19 tracker
- Is Twitter down today? What’s causing the issue and how can you fix it — Republic World
- Robinhood
- Snapchat
- Fastly
- Slack
- And this one.
- GlobalSign (CA)
- An Update from Robinhood’s Founders — Under the Hood
- PagerDuty
KubeWeekly #208: March 20th, 2020
The Headlines
Editor’s pick of the highlights from the past week.
Join SIG Scalability and Learn Kubernetes the Hard Way
Alex Handy, kubernetes.io
Contributing to SIG Scalability is a great way to learn Kubernetes in all its depth and breadth, and the team would love to have you join as a contributor. I took a look at the value of learning the hard way and interviewed the current SIG chairs to give you an idea of what contribution feels like.
- Invited contributors to SIG (Special Interest Group) and introduced Learning the Hard Way. And I recommend SIG Scalability as a realistic method of “Learn Kubernetes the Hard Way”.
- If you are interested, you can register here .
Kong Ingress Controller and Service Mesh: Setting up Ingress to Istio on Kubernetes
Kevin Chen, Kong
Kubernetes has become the de facto way to orchestrate containers and the services within services. But how do we give services outside our cluster access to what is within? Kubernetes comes with the Ingress API object that manages external access to services within a cluster.
- How to deploy Kong Ingress controller as Ingress layer for Istio mesh.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
CNCF Member Webinar: Small Is Not Always Beautiful — Moving Enterprise Applications to the Cloud
Paul Jenkins, Product Manager @Oracle Cloud Infrastructure (OCI) Cloud Native Services, and Tony Vertenten, Co-Founder and CTO @Intris
- Oracle Product Manager Paul Jenkins and Intris Co-Founder and CTO Tony Vertenten explain “small things aren’t always beautiful” based on Intris apps cloud migration cases Webinar video.
CNCF Member Webinar: Democratizing Analytics with Cloud Native Data Warehouses on Kubernetes
Robert Hodges, CEO @Altinity, and Vladislav Klimenko, Senior Software Engineer @Altinity
- Is the CEO of Altinity Robert Hodges said, the company’s Senior is a Software Engineer Vladislav Klimenko as a data warehouse is an OSS Mr. as “How to democratize the analysis of cloud-native data warehouse on Kubernetes” ClickHouse , ClickHouse Kubernetes Webinar video explaining the operator .
CNCF Member Webinar: Calico Networking with eBPF
Chris Hoge, Developer Advocate @Project Calico, and Shaun Crampton, Core Developer @Project Calico
- Project Calico and Tigera Developer Advocate Chris Hoge and Project Calico Core Developer and Tigera Principal Engineer Shaun Crampton explain Calico’s new eBPF ((Berkeley Packet Filter)) based data plane. Webinar videos.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
On the state of Envoy Proxy control planes
Matt Klein
- Lyft’s Software Engineer and Envoy Creator Matt Klein ‘s personal blog on the Envoy Proxy control plane and its analysis over the next few years.
Introducing istiod: simplifying the mesh control plane
Craig Box, Google
- Introducing Istiod, which aggregates control plane functions into a single binary with Istio 1.5 by Craig Box of Google. A supplemental information article on 3/19 about Istio 1.5 that I mentioned in last week’s article.
- He touches on the history of the Istio control plane, the costs of past complexity, the benefits of consolidating Istiod, additional expert information, and future comments.
Introducing the Calico eBPF dataplane
Shaun Crampton, Tigera
- Shana Crampton, Tigera’s Principal Engineer, introduced the contents of Calico’s eBPF ((Berkeley Packet Filter)-based new data plane on the day (2/25).
- This is the same as the above Webinar “ CNCF Member Webinar: Calico Networking with eBPF “. Webinar videos have the latest Q&A, so I think it’s good to see them.
Directing Kubernetes traffic with Traefik
Lee Carpenter
- An article that describes how to do two simple deployments, use Traefik to pass traffic from the outside to the internal cluster, and then remove the Kubernetes resources.
Your own Kubernetes controller — Laying out the work
Nicolas Fränkel
- Part 1 of the trilogy. This article describes how to get started implementing your own controller in languages other than Go.
Tutorial Cloud Native
- It explains security risks of Tiller up to version 2 of Helm, introduces new features of version 3, explains how to migrate from version 2 to 3, and notes on migration.
Show Me Your Code with Walter Dal Mut: Extend Kubernetes in NodeJS
Gianluca Arbezzano
- YouTube video on “Kubernetes development on NodeJS” by Gianluca Arbezzano , SRE and CNCF Ambassador of InfluxData , and Walter Dal Mut , Solutions Architect of Corley . The first place Gianluca is talking to is hard to hear, so it seems better to skip it.
5 tips for troubleshooting apps on Kubernetes
Alex Ellis
- 5 useful options for troubleshooting using the kubectl command and how to use them.
Our failure story with Redis operator for K8s (+ a brief look at Redis data analysis tools)
Flant staff
- An article shared the story of the failure and the lessons of Redis operators at Flant .
- He also touched on six OSS tools to analyze Redis data.
Introduction to Security Contexts and SCCs
Alexandre Menezes, Red Hat
- Article that proposes setting security context and SCCs (Security Context Constraints) as a means to prevent the following scenarios.
- On container platforms, created objects are protected by good RBAC practices, but Nodes may not be protected.
Creating Workspaces with the HashiCorp Terraform Operator for Kubernetes
Rosemary Wang, Hashicorp
- Hashicorp’s article about the alpha release of HashiCorp Terraform Operator. A YouTube demo video is also embedded in the article.
Recommended Steps to Secure a DigitalOcean Kubernetes Cluster
Damaso Sanoja
- DigitalOcean Kubernetes (DOKS) A DigitalOcean article on how to keep your cluster secure.
Digital Ocean
- Multilingual (English, German, Spanish, Portuguese, Russian) Tutorial page for developers and system administrators on the community page of DigitalOcean. I’m doing a keyword search for Kubernetes.
- However, with the keyword Kubernetes, German is not available as of March 21 (Sat).
The Editorial
Articles, announcements, and more that give you a high-level overview of challenges and features.
Adam Glick and Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. The current co-hosts are Craig Box and Adam Glick.
- Mr. Xiang Li , who developed etcd during the internship with CoreOS, continues to maintain the etcd, and currently cares for infrastructure at Alibaba , is a guest.
- The following three topics were of interest to me in News of the week.
- Hitachi Vantara acquires Containership’s assets
- KEDA and SMI join the CNCF Sandbox
- gVisor thread by Ian Lewis
Managing Harbor at cloud scale : The story behind Harbor Kubernetes Operator
Maxime Hurtrel
- A story that OVHcloud created a Kubernetes operator based on the Harbor project and made it OSS at the goharbor project under CNCF .
Securing Kubernetes Networking with Nicole Hubbard, Hashicorp
The New Stack Makers
- Podcast by Nicole Hubbard, Developer Advocate of HashiCorp.
- There is also a YouTube video .
- A YouTube video explains how to use Envoy and Consul Connect to securely maintain data communication between different Kubernetes and microservices.
- I used to listen to podcasts on “The New Stack” all the time, but I realized that there was a video this time. It is an easy-to-see slide, so we recommend watching the video.
Day 2 for the Operator Ecosystem
Gerred Dillon and Matt Jarvis, DevOps.com
- From the background of the need for operators in Kubernetes, I touched on the build tools Kubebuilder , Metacontroller , and KUDO , and explained the differences in the required specialties and differences.
- It also introduces Kuttl, a tool for writing tests that check whether the behavior of operators is correct .
4 ways to manage Kubernetes resources
Tomasz Cholewa
- From the viewpoint of simplicity and complexity, the resource management method of Kubernetes using the following four tools is explained.
1. kubectl command and yaml file
2. kustomize command
3. Helm chart
4. operator
Interoperability of open-source tools: the emergence of interfaces
Katie Gamanji
- I will skip it because it was taken up in DEVOPS WEEKLY ISSUE #480 above.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Lowering the Barrier to Kubernetes Proficiency — Navigating the Stormy Seas of Information
Chris Black, Sr. Solutions Engineer @CircleCI
Member webinar
March 25, 2020 10:00 AM Pacific Time
Continuous profiling Go application running in Kubernetes
Gianluca Arbezzano, Site reliability engineer @InfluxData
Ambassador webinar
March 27, 2020 10:00 AM Pacific Time
Container Security at Scale: Lessons Learned from the Front Lines with ABN AMRO and Palo Alto Networks
Wiebe de Roos, CI/CD Consultant @Flusso and ABN Amro
Keith Mokris,Technical Marketing Engineer @Palo Alto Networks
Member webinar
April 1, 2020 10:00 AM Pacific Time
Taming Your AI/ML Workloads with Kubeflow The Journey to Version 1.0
David Aronchick @Microsoft
Elvira Dzhureava, Technical Product Engineer AI/M @Cisco
Johnu George, Technical lead @Cisco Systems
Member webinar
April 2, 2020 9:00 AM Pacific Time
Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
April 3, 2020 10:00 AM Pacific Time
Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Buoyant
Member webinar
April 8, 2020 10:00 AM Pacific Time
Declarative Host Upgrades From Within Kubernetes
Adrian Goins,Director of Community and Evangelism @Rancher Labs
Dax McDonald,Software Engineer @Rancher Labs
Jacob Blain Christen, Principal Software Engineer @Rancher Labs
Member webinar
April 14, 2020 10:00 AM Pacific Time
如何让你的Windows应用运行在Kubernetes平台
杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis
张文墨Larry Zhang, 解决方案架构师 Solution Architect @Mirantis
Member webinar
This webinar will be delivered in Chinese
April 23, 2020 10:00 AM China Standard Time
Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time
Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!