SRE / DevOps / Kubernetes Weekly Collection#8(Week 13)

- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #482 March 22nd, 2020
SRE Weekly Issue #212 March 23rd, 2020
KubeWeekly #209: March 27th, 2020
DEVOPS WEEKLY ISSUE #482 March 22nd, 2020
News
- The title is “On the state of Envoy Proxy control planes”.
- Lyft’s Software Engineer and Envoy Creator Matt Klein ‘s personal blog post from last week’s article (KubeWeekly #208: March 20th, 2020) presents the Envoy Proxy control plane and its analysis over the next few years.
- The title is “MANAGING YOUR K8S CLUSTER VIA DAEMON SETS”.
- Proposal of how to use DaemonSets to manage the software, systems and configurations required to run a production environment.
- Personally I was curious that the label was “aikido”. When I open the personal site of the author James Hunt , the wallpaper of “Chrono Trigger” jumps in.
- The title is “Abstractions and serverless”.
- Explain the significance of abstraction and dig deep into serverless in that context.
- Creating a system with clear abstract boundaries that can be understood, maintained, and evolved is an issue that should be solved in many IT systems.
- The title is “Involving Engineers in Incident Management: QCon London Q&A”.
- Samuel Parkinson , Principal Engineer at Financial Times at QCon London , held from March 2nd to March 6th, said, ``We encourage engineers to be involved in incident management and the benefits of learning from past incidents. From the session Q&A.
- The author’s comment that “everything that happened in the past has new discoveries, and when new members join, it has a new perspective that the existing members did not notice” seems to be natural, and such a stance I felt that earnestly depends on the humanity of the team members, the leadership of the leader, and the past history.
- The title is “Break that big ball of mud!” October 2017 article.
- The article’s contents that has been originally published in “NDC 2016 Blog”. The author is a fan of Star Wars used Force, Yoda, Death Star and so on for his explanation.
- In reference to Yoda’s words from Star Wars, “every 15 years of coding experience dealing with legacy code causes a considerable proportion of fear, anger, hate, and pain,” he explains.
- “The title is “Serverless in the wild: characterizing and optimising the serverless workload at a large cloud provider”.
- A series that randomly looks at CS research by Adrian Colyer.
- This time, I’m taking a paper on “Features and optimization of serverless workloads in major cloud provider environment (Azure)” from arXiv (archive, same pronunciation as archive) . Click here for PDF version .
- The author paid attention because Jonathan Mace tweeted about the original article on twitter .
- It is explained by including diagrams and graphs from the viewpoint of cold start, pre-warming, keep alive, idle time, resource management, and cost. Interesting.
- The title is “Egress Filtering in Serverless Applications”.
- It explains the importance, methods, and risk examples of filtering outbound communications that are often overlooked by serverless apps.
Tools
- Web page of “Backstage”, a tool that provides a unified front-end portal screen for developers.
- Click here for the GitHub page .
Docker released a useful new GitHub Action which makes building and publishing Docker images easier. Some nice touches like automatic tagging and building multiple tags.With a surge of developers and IT practitioners working remotely, there’s also a surge of confusion and operational inefficiency. See how data and automation is improving the way DevOps and IT operations engineers build, release and maintain reliable services remotely:
- A blog post from Victor Ops, a sponsor of DevOp Weekly.
- The title is “Using Data and Automation to Help Engineering Teams Work Remotely”.
- Regarding the “remote work” that has been attracting the most engineers’ attention these days, while referring to the Network Operations Center (NOC) model etc., we touched on the automation, the data linkage method, etc. that should be considered, and as a solution, 14 days of their own service proposed.
SRE Weekly Issue #212 March 23rd, 2020
Articles
This very clearly written paper describes the Google G Suite team’s search for a meaningful availability metric: one that accurately reflected what their end users experienced, and that could be used by engineers to pinpoint issues and guide improvements.
Hauer et al. — NSDI’20 (original paper) Adrian Colyer — The Morning Paper (summary)
- This week’s series is a random look at CS research by Adrian Colyer, who was featured in DevOps Weekly.
- This time, I’m taking a paper from NSDI ’20 (SANTA CLARA, Calif., CA from February 25 to February 27) hosted by USENIX , “A Survey of Meaningful Availability Metrics by Google’s G Suite Team.” Click here for PDF .
- The author has recommended it by Mr. Damien Mathieu.
- You can state the definition of the keyword properly in the text. For example, “meaningful” means capturing the user experience.
- This is content that you can repeatedly read and discuss deeply. Personal homework.
Our Top 5 On-Call Practices — Blameless: Better Reliability Through SRE
Their top 5 are:
- Use Meaningful Severity Levels
- Create Detailed Runbooks
- Load Balance Through Qualitative Metrics
- Get Ahead of Incidents
- Cultivate a Culture of On-Call Empathy
Emily Arnott — Blameless
- Entered with the prelude to “you may consider on-call as a necessary evil,” the five best practices, above, suggested what will make your team responsive, build a more resilient system, and minimize repeated interruptions.
NTP: Building a more accurate time service at Facebook scale
Synchronizing clocks can be critical in an HA system, and Facebook went to great lengths to ensure clock accuracy.
Zoe Talamantes and Oleg Obleukhov — Facebook
- An introduction to the importance and accuracy of NTP (Network Time Protocol) in the scale of Facebook.
- It is helpful to compare chrony and ntpd in detail.
- PTP (Precision Time Protocol) has been unmarked so far, so I would like to check it as well.
The Fallacy of Move Fast and Break Things
You might end up just breaking things.
Dawn Parzych — LaunchDarkly
- The content of the article the author originally published on DevOps.com .
- Mark Zuckerberg’s words on Facebook, “move fast and break things,” have become the motto of many development teams, and many companies that want to become unicorns imitated, but across the industry, all teams argue that it’s not going well.
- “High-performance teams have good systems and processes that help this idea work, and don’t take it at face value,” he said. And it is suggested to prepare tools.
InSearch: LinkedIn’s new message search platform
LinkedIn’s message search system takes advantage of the fact that relatively few users actually search their message. It only builds a search index the first time a user performs a search.
Suruchi Shah and Hari Shankar — LinkedIn
This followup post from Bungie covers two related incidents in February that caused loss of user data.
Bungie
- A story of obstacles and rollbacks in the game “Destiny 2” developed and operated by Bungie.
Involving Engineers in Incident Management: QCon London Q&A
An interview about how one company got their developers to join the on-call rotation. It covers how they trained them to help them build confidence and what benefits they got by joining.
Ben Linders — InfoQ
- I will skip it because it is taken up in DEVOPS WEEKLY ISSUE #482 above.
Outages
The text of this incident originally mentioned Heroku, and it lines up with the Heroku outage below.
They also had this unrelated outage.
Heroku suffered two short bouts of 85% request failure to applications hosted on their platform.Separately, they recently posted a couple of followup reports for previous incidents:
* Incident #1961: logging outage
* Incident #1968: EU application errors
- Zoom
- MacStadium
- Hulu
- Bumble
- Microsoft Teams and Office 365
- Discord
- Discord posted this gem of a followup analysis just a few days after their outage last week.
- GoToMeeting
- Google Nest
- DoorDash
KubeWeekly #209: March 27th, 2020
The Headlines
Editor’s pick of the highlights from the past week.
Kubernetes 1.18 is the first release of 2020! Kubernetes 1.18 consists of 38 enhancements: 15 enhancements are moving to stable, 11 enhancements in beta, and 12 enhancements in alpha.
Kubernetes 1.18 is a “fit and finish” release. Significant work has gone into improving beta and stable features to ensure users have a better experience. An equal effort has gone into adding new developments and exciting new features that promise to enhance the user experience even more. Having almost as many enhancements in alpha, beta, and stable is a great achievement. It shows the tremendous effort made by the community on improving the reliability of Kubernetes as well as continuing to expand its existing functionality.
- Release information for Kubernetes version 1.18. As mentioned above, 38 functions have been improved (15 functions made stable, 11 functions made beta, 12 functions made alpha).
- Check out the release logo, major changes and release notes , GitHub download page, and other essential information and links.
Kubernetes 1.18, with release team manager Jorge Alarcon
Adam Glick and Craig Box, Kubernetes Podcast from Google
Kubernetes 1.18 is out — almost! A bug has pushed it back a day. While you’re waiting, release team lead Jorge Alarcon will tell you all about the fit and finish you can expect in the release when it’s out tomorrow. Adam and Craig bring you the other community news of the week, as well as some podcast follow-up.
- Kubernetes Podcast by Google employees. The current co-hosts are Craig Box and Adam Glick.
- Guest is the release team lead of Kubernetes community and Jorge Alarcon , SRE of searchable.ai.
- The following three topics were of interest to me in News of the week.
- CNCF SIG Contributor Strategy
- KubeCF becomes a Cloud Foundry Foundation incubation project
- Kubei, a new open source runtime vulnerabilty scanner by Portshift.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
You can view all CNCF recorded and upcoming webinars here.
CNCF Project Webinar: How to Migrate a MySQL Database to Vitess
Liz van Dijk, Solution Architect and Field Operations @PlanetScale
- Webinar video explaining “How to migrate MySQL database to Vitess” by Liz van Dijk , Solution Architect & Field Operations at PlanetScale.
- There is a demo and it is easy to see.
- Please note that the sound sometimes skips, so it may be due to the speaker’s network environment during shooting.
Angel Rivera, Developer Advocate @CircleCI
- Circle CI’s Developer Advocate Angel Rivera explains Kubernetes “to lower the barrier to improvement for new Kubernetes learners “.
- They carefully explain the background, abbreviations, resources, components, etc. that made Kubernetes needed.
The Technical
Tutorials, tools, and more that take you on a deep dive into the code.
Anatomy of my Kubernetes Cluster
Antonin Stefanutti
- Antonin Stefanutti , a Software Engineer at Red Hat, performed the anatomy of “Home Kubernetes” according to his own requirements. Great!
Writing Kubernetes network policies with Inspektor Gadget’s Network Policy Advisor
Alban Crequy, Kinvolk
- Using the Inspektor Gadget of OSS, which is a collection of gadgets for debugging and investigating apps on Kubernetes by Alba Crequy, CTO & co-founder of Kinvolk , they introduced how to write network policy by the Inspektor Gadget’s “Network Policy Advisor”.
- They’re looking for contributors, and we’re also calling for participation in the #inspektor-gadget discussion on Kubernetes Slack.
Okteto Push — Your Code to Kubernetes in Seconds
Pablo Chico de Guzman, Okteto
- Okteto Push’s Founder & CTO Pablo Chico de Guzman introduces Okteto Push in his blog.
- They said that Okteto Push is “The fastest way to push your code changes to Kubernetes”.
- They are looking for feedback on twitter of okteto or #okteto channel of Kubernetes.
Converting an Old MacBook Into an Always-On Personal Kubernetes Cluster
Sid Palas, DevOps Directive
- The Author wanted a cluster of Kubernetes that was always up, so he talked about clustering a “2012 MacBook Air” that he had at hand and unused.
Quality of Service and OOM in Kubernetes
Ciro S. Costa, OpsTips
- An article on the personal blog site of Ciro S. Costa, a Software Engineer at VMware, Inc. (checked the title as the updated on twitter).
- He has been using Kubernetes resources for quite a long time, but he has’t personally digged deep into Kubernetes resources at the node level, which is the subject of this article.
- Pos eviction of 3 QoS (quality of service) classes , OOM score, cgroup tree, cgroup unit memory, and kubelet are explained carefully.
Ciro S. Costa, OpsTips
- The same author as the one above.
- An article that examines and explains whether kubelet makes Secret of Kubernetes available to processes in node.
- Secret management is a theme that I see a lot these days, but I don’t understand much about myself. This is also my homework.
Setting up a ProxySQL Sidecar Container
Jake Davis, Percona
- Introduction of ProxySQL sidecar setting method by Jake Davis, DBA (Database Administrator) of Percona .
- Their customer, Duolingo , had reached Aurora’s maximum connections of 16,000 (which was the hard limit for all instance classes), but using ProxySQL sidecars now (2020, 3/23 peak) It is said that the time is kept at around 6,000.
OpenShift 4.4 OKD Bare Metal Install on VMWare Home Lab
Craig Robinson, East Carolina University
- OKD is an upstream and community support version of Red Hat’s OCP (OpenShift Container Platform).
- He explains the settings so that you can test the cluster of OKD 4.4 in your home environment.
- All you need is basic knowledge of virtualization platforms, Linux, and the ability to ask Google search engine.
- The screen shots and explanations are generous.
Building a TODO API in Golang with Kubernetes
Alex Ellis
- CNCF Ambassador Alex Ellis for Kubernetes beginners who want to write a practical Go API and deploy and manage their to-do list on Kubernetes.
A Guide On The Installation Of Spinnaker in Kubernetes Cluster
Vikas Saini, Magalix
- An article that explains the procedure to install Spinnaker on GKE using halyard , which is a tool for installing, configuring, and updating Spinnaker .
A Primer: Continuous Integration and Continuous Delivery (CI/CD)
Catherine Paganini, Kublr
- A series of articles explaining IT concepts to business leaders.
- The theme this time is CI/CD, and explanations are given alongside keywords and diagrams that are easy to imagine.
The Editorial
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Threading the Needle on Kubernetes Complexity with AI-Powered Observability
Andreas Grabner, DevOps.com
- Kubernetes talks about the complexities and the large amount of data that needs to be met by AI observable products. There is no talk of concrete tools.
A ‘No-BS’ Checklist for Kubernetes
Oleg Chunikhin, Kublr
- No-BS = Bad Staff, the authors, have created and shared a checklist to identify vendors and services that do not include the requirements needed to run Kubernetes in an enterprise production environment. “Nice-to-Haves” is also described for good elements.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Container Security at Scale: Lessons Learned from the Front Lines with ABN AMRO and Palo Alto Networks
Wiebe de Roos, CI/CD Consultant @Flusso and ABN Amro
Keith Mokris,Technical Marketing Engineer @Palo Alto Networks
Member webinar
April 1, 2020 10:00 AM Pacific Time
Taming Your AI/ML Workloads with Kubeflow The Journey to Version 1.0
David Aronchick @Microsoft
Elvira Dzhureava, Technical Product Engineer AI/M @Cisco
Johnu George, Technical lead @Cisco Systems
Member webinar
April 2, 2020 9:00 AM Pacific Time
Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
Ambassador webinar
April 3, 2020 10:00 AM Pacific Time
Pravega: Rethinking storage for streams
Dell
Member webinar
April 7, 2020 10:00 AM Pacific Time
Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
Buoyant
Member webinar
April 8, 2020 10:00 AM Pacific Time
New thoughts on distributed file system in the cloud native era JD.com
Member webinar
April 9, 2020 10:00 AM Pacific Time
Declarative Host Upgrades From Within Kubernetes
Adrian Goins,Director of Community and Evangelism @Rancher Labs
Dax McDonald,Software Engineer @Rancher Labs
Jacob Blain Christen, Principal Software Engineer @Rancher Labs
Member webinar
April 14, 2020 10:00 AM Pacific Time
如何让你的Windows应用运行在Kubernetes平台
杨雨 Alex Yang, 解决方案架构师 Solution Architect @Mirantis
张文墨Larry Zhang, 解决方案架构师 Solution Architect @Mirantis
Member webinar
This webinar will be delivered in Chinese
April 23, 2020 10:00 AM China Standard Time
Kubernetes 1.18
Kubernetes team
Project webinar
April 23, 2020 9:00 AM Pacific Time
Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
Member webinar
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!