SRE / DevOps / Kubernetes Weekly Collection#9(Week 14)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #483 March 29th, 2020
SRE Weekly Issue #213 March 29th, 2020
KubeWeekly # 210: April 3rd, 2020
DEVOPS WEEKLY ISSUE #483 March 29th, 2020
News
Part 2 / Part 3 / Part 4 / Part 5
- The title is “NATS Messaging-Part 1-Part 5”.
- Introducing the five-part series of the CNCF Incubating Project, Messaging Tool NATS , by Systems Architect RI Pienaar.
- It has great commentary, diagrams, and demo videos, which is amazing. This is homework!
A detailed look at improving the performance of disk encryption in Linux.
- The title is “Speeding up Linux disk encryption”.
- Cloudflare’s Ignat Korchagin on his blog that Cloudflare has more than doubled the performance of disk encryption for Linux both internally and for customers.
- The layers of encryption, source code, processes, protocols, performance tests and various other personally unfamiliar elements rushed in all at once. This is also homework.
- The title is “A Quick Guide to Building a Custom Docker Image for CI”.
- An article by John Ruble, Software Consultant & Developer at Atomic Object, that explains “Why Docker is such a great fit for CI and how it can be made even better with custom images.”.
- There is also a simple hands-on to create a custom image and push it to Docker Hub.
- The title is “Tracing and Observability”.
- The everyday principle of “checking luggage location information, route, arrival schedule from luggage tracking information” also applies to software systems. Easy introduction text.
- The title is “How to detect outdated Kubernetes APIs”.
- An article introducing Deprek8 and Conftest as tools to detect out-of-support Kubernetes APIs.
- Deprek8 is a set of OPA (Open Policy Agent) policies that check for out-of-support API versions in your repository. OPA Deprek8 policy defined using Rego query language is explained as one of the usage methods of policy.
- Conftest can be used with Deprek8 to apply Rego policies to any number of configuration files. It supports YAML, JSON, CUE, Dockerfile, HCL, HCL2 (Experimental), XML, etc.
A post in praise of AWS S3. The influence of S3 is hard to argue against and this post explains why.
- The title is “IN PRAISE OF S3, THE GREATEST CLOUD SERVICE OF ALL TIME”.
- An article by A Cloud Guru praising AWS’s S3. The company helped me when I paased the AWS SAA in English.
- Helmfile Starter Kit GitHub page. A starter kit used to introduce complex software projects based on Helm files to Kubernetes.
- Helmfile Starter Kit GitHub page. A starter kit used to introduce complex software projects based on Helm files to Kubernetes.
Tools
- GitHub page for ssmsh, a handy shell tool for AWS EC2 Parameter Store.
- GitHub page of SSM Helpers, which is an auxiliary tool of “AWS Systems Manager that gives AWS infrastructure control and visibility”.
- A blog post from Victor Ops, a sponsor of DevOp Weekly.
- The title is “The War Room for Major Incident Response and Remediation”.
- In the event of a failure, a large team will work together to identify the root cause, address the need to resolve it, and propose a “War Room” service and its 14-day free trial as a solution.
SRE Weekly Issue #213 March 29th, 2020
Articles
COVID-19: Why We Should All Wear Masks — There Is New Scientific Rationale
This is important, and well worth a read. Where’s the SRE connection? The article explains that the U.S. Surgeon General’s comment that masks are “not effective” led to a stigma against those that wear them here. That kind of unintended sociological effect is uncovered commonly in incident post-analysis.
Sui Huang
- From the perspective of “SRE”, an article that talks with logic from the point of contention that “why we should all wear masks as a countermeasure against COVID-19”.
Keeping the Internet “Always On” — the Pressure of COVID-19 on Incident Response Teams
Pagerduty ran the numbers and discovered an increase in incidents recently, especially in certain companies.
Rachel Obstler — PagerDuty
- According to PagerDuty’s research, the pressure of COVID-19 has led to an increase in incidents across all companies using the PagerDuty platform, with a significant increase in the number of companies offering specific services such as online learning.
February service disruptions post-incident analysis
Here’s the scoop on all those GitHub incidents in February.
Keith Ballinger — GitHub
- Post-mortem analysis article of GitHub Inc.’s total of 4 incidents in February, 8 hours and 14 minutes of service interruption.
- Originally, SQL data was stored in a single cluster, but as the service grew, it was split into new clusters by feature group set and new features in new clusters, but many core datasets remained in the original cluster.
- They have consistently scaled their database to accommodate the increasing load on new users and products. In this case, unexpected database load balancing caused a cluster regression and inaccessibility.
Embrace Resilience for Business Continuity in Times of Uncertainty
No, it won’t be possible to continue operating business-as-usual. For the unforeseeable future, teams across the world will be dealing with cutbacks, infrastructure instability, and more. However, with SRE best practices, your team can embrace resilience and adapt through this difficult time.
Hannah Culver — Blameless
- As SRE, he explains that the current difficult situation (problems of future outlook, reduction of team personnel around the world, instability of infrastructure, etc.) is treated as an incident and resilience is dealt with. article.
- I received it as saying that it is difficult to be bound by the wording and rules created by the principle, and that it is based on the principle and that we improve according to reality because it is difficult.
- He said that “learning resources will also need to be flexible”, introducing several resources and online events.
5 tips for incident management when you’re suddenly remote
I love the concept of “ephemeral information”, that is, discussions that happen out-of-band, making it much harder to analyze the incident after the fact.
Blake Thorne — Atlassian
- “I’ve suddenly become a remote worker, what should I do with incident management?” Atlassian’s Product Marketing Manager Blake Thorne asked five tips from many teams to share five tips.
- Atlassian’s “ Incident Management Handbook “ , which has practiced “remote first incident management” , can be used as a reference because it is distributed free of charge if you fill in the necessary information (name/company name/title/email address).
Elastic Cloud January 18, 2019 Incident Report
Grey failure turned a seemingly reasonable auto-recovery mechanism into a DoS caused by a thundering herd.
Panagiotis Moustafellos, Uri Cohen, and Sylvain Wallez — Elastic
- As the title suggests, a report issued on January 31, 2019 regarding a 1/18 incident last year (2019).
- Customers using Elastic Cloud have experienced an estimated 3 hours of severe access failures in their AWS eu-west-1 (Ireland) region deployments, and almost 20 minutes of inaccessibility across all deployments during the same time period. It was
- He apologized for the impact of the service, found out the root cause and took measures to prevent a recurrence, provided guidance to the contact point when there were further concerns or doubts, and made an apology again before looking back at the failure.
Outages
- G Suite
- Google Cloud Platform
GCP had a major incident that caused the G Suite outage.GCP also had an (apparently) unrelated outage later in the day. - BitBay (cryptocurrency exchange)
- Netflix
- Uber
- Fastly
Also this one. Full disclosure: Fastly is my employer. - Discord
- Brightcove
- Zoom
- DoorDash
- Nest
- Canvas (remote learning tool)
KubeWeekly #210: April 3rd, 2020
The Headlines
Editor’s pick of the highlights from the past week.
Join us for Cloud Native Summit Online!
With the postponement of KubeCon + CloudNativeCon EU, and many of our other favorite face-to-face industry events, CNCF, GitLab, Kong, and NetApp are excited to announce the Cloud Native Summit Online as another event to get the community together!
Cloud native open source projects, SIGs, and working groups are fundamental to many of our jobs. As we adjust to working remotely and maintaining productivity, we are excited to bring together experts from the community to provide insights and support around cloud native technologies and CNCF projects.
The virtual event will take place on Tuesday, April 7 from 6:00 am — 2:00 pm PT / 15:00–23:00 CET. We hope you’ll join us next week!
- CNCF conducted Cloud Native Summit Online on 4/7 (Tue) local time. Japan time is 4/7 (Tue) 22:00 to 4/8 (Wed) 08:00. (According to Google Calendar)
- The contents are as follows.
* Graduated CNCF Project Updates-Kubernetes, Prometheus, Envoy, Jaeger, Fluentd, Containerd, CoreDNS, Vitess, TUF.
* Communication with key SIG and WG contributors.
* CNCF Update information of cloud native technology that is coming.
* Something nice for remote work, such as gifts, jokes, tips, etc.
CNCF projects surpass one billion lines of code: A Q&A with DevStats creator Łukasz Gryglicki
CNCF Staff
In monitoring DevStats, the community came across an incredible milestone — all CNCF projects combined have surpassed one billion lines of code. That’s right, one billion!
To mark this achievement, we sat down with DevStats creator Łukasz Gryglicki to learn more about the tool, it’s history, and how our community can benefit from it. Read the blog here.
- According to DevStats , an OSS tool that collects and visualizes CNCF project data , the CNCF project code exceeds 1 billion lines.
- We interviewed DevStats creator Łukasz Gryglicki about the meaning of over a billion lines of code for DevStas and CNCF.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
You can view all CNCF recorded and upcoming webinars here.
CNCF Ambassador Webinar: Continuous Profiling Go Application Running in Kubernetes
Gianluca Arbezzano, Site Reliability Engineer @InfluxData
- Webinar video explaining “Profiler OSS tool Profefe” by InfluxData SRE and CNCF Ambassador Gianluca Arbezzano. Previously, this blog also covered the article he wrote.
CNCF Member Webinar: MindSpore and Cloud Native Ecosystem
Zhipeng Huang, Open Source Community Manager @MindSpore and Yedong Liu, Open Source Engineer @Huawei
- MindSpore’s Open Source Community Manager Zhipeng Huang and Huawei’s Open Source Engineer Yedong Liu explain “MindSpore, a new deep learning and guessing framework OSS tool and cloud native ecosystem” Webinar video.
Wiebe de Roos, CI/CD Consultant @Flusso and ABN Amro and Keith Mokris, Technical Marketing Engineer @Palo Alto Networks
- “Will Containers Secure DevOps Compliance in Large Scale Environments” by Wiebe de Roos, CI/CD Consultant at Flusso and ABN AMRO, and Keith Mokris, Technical Marketing Engineer at Palo Alto Networks. Webinar video explaining.
- The explanations and illustrations are easy to understand, and the voice is easy to hear.
CNCF Member Webinar: Taming Your AI/ML Workloads with Kubeflow — The Journey to Version 1.0
Johnu George, Technical Lead @CPSG-AI at Cisco, David Aronchick, Head of Open Source Machine Learning Strategy @Microsoft and Elvira Dzhuraeva, Technical Product Manager AI/ML @ Cisco
- AI/ML by Johnu George, Technical Lead of Cisco’s CPSG-AI Team, Elvira Dzhuraeva, Technical Product Manager AI/ML, and David Aronchick, Head of Open Source Machine Learning Strategy, Microsoft Webinar video that explains “How to master workload using Kubeflow”.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
How to detect outdated Kubernetes APIs
Tyler Auerback, Red Hat
- I will skip it because it was taken up in DEVOPS WEEKLY ISSUE #483 above.
Agustin Romano, Caylent
- An article introducing the GitOps overview, benefits, best practices and tools to implement GitOps, the CNCF Sandbox project Flux.
Evaluating Predictive Autoscaling in Kubernetes
Jamie Thompson, IBM
- The author has developed a CPA (Custom Pod Autoscaler) similar to HPA (Horizontal Pod Autoscaler) as an OSS for Kubernetes autoscale for the past 6 months , and created a PHPA (Predictive Horizontal Pod Autoscaler) in it to create a predictive autoscaling function. Was pre-released to provide HPA using a statistical model. This time, an introduction article of the contents and results of testing PHPA.
How to Secure Your Kubernetes Cluster on GKE
Lewis Marshall, Appvia
- GKE is easy to use, but you have to control the security yourself, and you can read the documentation that many features and changes are adapted to the Kubernetes version.
- The author recommends that “if you handle sensitive workloads in a production environment, set the implementation within the range described in this article”.
With Kubernetes Operators comes great responsibility
Jason Shepherd, Red Hat
- An article that explains the usage method based on appropriate authority settings using Operator’s RBAC and Service Account.
Using UBI images to minimize container vulnerabilities
Rags Srinivas, Snyk
- An article introducing Red Hat Universal Base Images (UBI) announced at the 2019 Red Hat Summit as an image that minimizes container vulnerabilities.
Build a Kubernetes Operator in 10 minutes with Operator SDK
Manuel Dewald, Red Hat
- As described in the title, an article that explains the CLI that makes an operator of Kubernetes in 10 minutes with Operator SDK and explanation. It is easy to start and the hurdles go down. Before the above article that considers Operator’s security, it seems good to do from here.
Kpt: Packaging up your Kubernetes configuration with git and YAML since 2014
Phillip Wittrock, Google
- Introduction article of OSS tool Kpt which is YAML management tool of Kubernetes from Google’s Open Source Blog.
Provisioning cloud resources (AWS, GCP, Azure) in Kubernetes
Daniele Polencic, LearnK8s
- An article that focuses on Service Catalog , Kubeform , Config Connector (GCP), and AWS Operator Service as a method for provisioning Kubernetes on 3 cloud providers (AWS/GCP/Azure). A masterpiece. To be honest, I haven’t read it through. This is my weekly bookmark article.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Edge Computing Requires Cloud Native Thinking Today
Bill Mulligan, Loodse
- “To operate Kubernetes and cloud-native technology with edge computing, there must be a business model for operation, but it is still in its infancy, and it is expected that the market will grow by about 30% per year by 2024. So let’s do it together,” and an article calling for participation in the community and discussions.
Optimising UE4 Project Builds With Cloud Native Infrastructure And Containers
Jose Moreira
- While struggling to change jobs during COVID-19 , contributing to the Unreal Containers community, freeing up resources such as free developers and small to medium sized game companies to make great games. An article by the author who has the ambition to become.
Todd Campbell, Sensu
- An article focusing on the decisions that readers will have to make, while briefly touching on the differences from other platforms when migrating Kubernetes.
14 Kubernetes interview questions: For hiring managers and job seekers
Kevin Casey, Red Hat
- The articles share the idea of “Filling Kubernetes jobs can be tricky because the technology is relatively young, so 14 questions that can be used/prepared in interviews for hiring managers and job seekers”.
- The word “good questions lead to good answers” recently came to my mind, but I thought this question list was a good question for understanding points.
Service Mesh Adds Security, Observability and Traffic Control to Kubernetes
Emily Omier, The New Stack
- An introductory article in a series of two weeks by The New Stack with the theme “The Value of Service Mesh for Kubernetes Deployment”.
- Kubernetes and service mesh, Kubernetes and Istio relationship, service mesh theme security, observability, and routing are explained.
BotKube can be integrated with multiple messaging platforms like — Slack, Mattermost to help you monitor your Kubernetes cluster(s), debug critical deployments and give recommendations for standard practices by running checks on the Kubernetes resources.
- Bot page of “BotKube”, a tool to monitor, debug, and check Kubernetes.
- It seems good as a UI that you can monitor with bot and chat, and debug with CLI.
MKIT — Managed Kubernetes Inspection Tool
Brad Geesaman, Darkbit
- Kubernetes io page of “MKIT (Managed Kubernetes Inspection Tool)” which is a tool for quick and easy check of misconfiguration. Click here for the GitHub page.
Adam Fitzgerald, HashiCorp
- HashiCorp is a member of the CNCF! Click here for the article.
Ansible for Kubernetes by Jeff Geerling Free until end of April
Jeff Geerling, Ansible
- Jeff Greeling, the author of “ Ansible for DevOps “ and “ Ansible for Kubernetes “ said, “To help people who are self-isolated or lost their jobs gain automation skills” until the end of March. The article told you that the distribution of the book had been extended by the sponsor of Device42 until the end of April(at that moment)
- At the same time, he thanked for personal donations and those who gave their words to him.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
April 3, 2020 10:00 AM Pacific Time
Pravega: Rethinking storage for streams
Dell
Member webinar
April 7, 2020 10:00 AM Pacific Time
Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Buoyant
Member webinar
April 8, 2020 10:00 AM Pacific Time
New thoughts on distributed file system in the cloud native era
JD.com
Member webinar
April 9, 2020 10:00 AM Pacific Time
Declarative Host Upgrades From Within Kubernetes
Adrian Goins,Director of Community and Evangelism @Rancher Labs
Dax McDonald,Software Engineer @Rancher Labs
Jacob Blain Christen, Principal Software Engineer @Rancher Labs
Member webinar
April 14, 2020 10:00 AM Pacific Time
Enabling Cloud Native Storage for the Enterprise
Chris Merz, Principal Technologist for DevOps @NetApp
George Tehrani, Product Manager for Kubernetes and Cloud Native Data @NetApp
Member webinar
April 16, 2020 10:00 AM Pacific Time
KubeCarrier: The Operator of Operators
Nico Schieder, Software Engineer @Loodse
Member webinar
April 22, 2020 10:00 AM Pacific Time
如何让你的Windows应用运行在Kubernetes平台
杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis
张文墨Larry Zhang, 解决方案架构师 Solution Architect @Mirantis
Member webinar
This webinar will be delivered in Chinese
April 23, 2020 10:00 AM China Standard Time
Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time
Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!