- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #526 January 24th, 2021
- The title is “Four levels of maturity that bridge the AppSec / engineering divide”.
- It proposes to integrate security work into continuous delivery (CD) as one of the most useful tools for successfully linking security and engineering.
- It describes the following four typical maturities that security and engineering organizations pass through when building a pipeline of continuous integration (CI) and automation.
○ Level 1: Security finds problems; Engineering fixes them
○ Level 2: Security and Engineering collaborate to produce test cases and remediations
○ Level 3: After the issue is fixed, Security and Engineering collaborate to find systemic fixes and develop checks
○ Level 4: Security and Engineering now also proactively look for new classes of issues and create systemic checks before an actual problem occurs
- The title is “PostgreSQL on ARM-based AWS EC2 Instances: Is It Any Good?”.
- It follows the announcement of AWS’s second generation Graviton2 based EC2 instance in May 2020, they tested PostgreSQL on ARM-based EC2 instances as the title suggests.
- The title is “How We Improved Smashing Mag Performance”.
- This blog post details efforts to improve web pages running on JAM Stack using React. It optimized web performance and improved Core Web Vitals metrics.
- Core Web Vitals is a subset of Web Vitals. Web Vitals, announced by Google in 2020, provides unified guidance on the high-quality signals that are essential to delivering a great user experience on the web.
Rust is picking up lots of interest recently, especially for systems work or low-level CLI tooling. But it might not be suitable, as a language or an ecosystem, yet for higher-level work like web development and APIs.
- The title is “Rust is a hard way to make a web API”.
- While touching on the goodness of Rust at the beginning, the author explains the content of the struggle of the title based on his experience.
- The title is “Design choices for a declarative installer”.
- If you focus on installing, upgrading, and removing a set of Kubernetes components, you can use the off-the-shelf software below, depending on your target environment, but you will need to tweak to integrate multiple components. It is mentioned at the beginning that the amount of configuration can cause frustration, errors, and nightmares to deal with.
○ For Kubernetes apps there is Helm and Continuous Delivery systems like Argo that can manage applications lifecycle described simply in naked yaml.
○ For pure operators there’s Operator Lifecycle Manager (OLM).
○ For more general infrastructure there is Terraform.
- It presents the design choices that led them to their current approach on how to solve the above problems and create a hopefully better user experience.
I think we’re starting to see a new rise of distributions, several related tools that solve a larger problem when combined together. Konveyor is a new project focused on migration, with tools for moving between and to Kubernetes, moving virtual machines and more.
- As the Editor mentioned above, the web page of the community “Konveyor” that focuses on tool-based migration. They are working on the following tools.
○ crane — Migrate namespaces between Kubernetes clusters.
○ forklift — Migrate virtual machines to KubeVirt.
○ move2kube — Migrate from Cloud Foundry or Docker Swarm to Kubernetes.
○ pelorus — Measure the four critical measures to software delivery performance.
○ windup — Analyze applications for modernization paths.
- A web page of project “Cinc” which has the following two goals.
- Making Chef Software Inc’s open source products easily distributable, by anyone
- Creating free distributions of Chef Software Inc’s open source products
- The phrase “CINC is not Chef” under the logo is reminiscent of “YAML Ain’t Markup Language”.
Web Assembly is a low level technology which is likely to have wide ranging influence. A good example of the kinds of innovation it makes possible are things like Artichoke, a new Ruby language which compiles to a WASM binary.
- A Web page of Ruby implementation “Artichoke” written in Rust and Ruby.
- Click here for the GitHub page.
- A GitHub page of “Policy Hub CLI” that provides policy creators with a standard format to share policies to make them searchable.
SRE Weekly Issue #254 January 24th, 2021
This one’s juicy. At one point, the front-end was blocked up, so the back-end saw less traffic and scaled down. Then when the traffic came flooding back, the back-end was ill-prepared. We can all learn from this.
- As stated in the title, Coinbase’s post-mortem. It has been updated with the full version of the post-mortem. It details the causes of downtime, how remediated it, and the steps taken to prevent similar outages.
- The outage affected coinbase.com and the API used to serve mobile apps, but did not affect exchange trading through the API and the health of the underlying market.
Cloudflare has what amounts to a sophisticated staging environment for testing new code.
Yan Zhai — Cloudflare
- It describes simulation, one of the techniques used to fight with software complexity.
- The Cloudflare’s simulation system “SOAR”, which is also in the title, has the following environment.
○ Simply put, it’s a data center built specifically for simulations. It runs the same software stack as our production data centers, but without any production traffic. Within SOAR, there are end-user servers, product servers, and origin servers (Figure 2). The product servers behave exactly the same as servers in our production edge network, and they are the targets that we want to test.
Sometimes rolling back doesn’t actually get you back to a good state, especially when there’s pent-up demand.
Rachel By the Bay
- It shares about the obstacles she experienced. As for the event that occurred, as described in the title and the comment of the above Editor.
Here’s Google’s follow-up on a Google Meet outage earlier this month.
- A summary of the outage in the title. Due to a Google Meet outage, the landing page cannot be accessed. With the release of the new landing page, it had set a redirect between the old and new landing pages, but redirect loops occurred there.
Those are some seriously big database servers.
Josh Aas and James Renken — Let’s Encrypt
- It explains that Let’s Encrypt has achieved satisfactory results with the database server upgrade that was carried out in the late 2020.
A great general overview of all aspects of incident response, including definitions and best practices.
- The content of the title is explained along with the following items “5 parts of the incident management process” and “5 steps to a bulletproof incident management process”.
- 5 parts of the incident management process
- Best incident monitoring practices
- Best on-call practices
- Best incident alerting practices
- Best incident communication practices
- Best incident response practices
- 5 steps to a bulletproof incident management process
- Best incident monitoring practices
- Best on-call practices
- Best incident alerting practices
- Best incident communication practices
- Best incident response practices
Check out what happens when you unleash a generalized language model AI on some log messages related to an incident.
Larry Lancaster — Zebrium
- It explains what they have done with OpenAI ‘s “GPT-3 language model” that the author is involved in so that he can get a glimpse of what they have done. It shares a couple of straightforward results with basic prompts only.
The CRE team at VMware undertook a project to find and reduce toil. Note that “with VMware CRE” does not mean “with some product named VMware CRE™”.
Gustavo Franco — VMware
- VMware’s CRE (Customer Reliability Engineering) team describes the following results from a recently completed operational load assessment.
○ As a result, we significantly reduced that load, improved our team well-being, and increased the amount of spare time and energy we have to invest in reliability engineering projects to improve Tanzu.
This is Slack’s RCA for their outage earlier this month. This is a great example of a complex incident with many contributing factors — certainly no single “root cause” here.
- As the title suggests, the final version of Slack’s outage.
- As commented by the Editor above, there are many items in “Corrective Actions” because the outage is caused by multiple factors and is no single root cause. For some items, they have corrective actions with the cooperation of the cloud provider.
KubeWeekly #248 January 29th, 2021
Editor’s pick of the highlights from the past week.
This month’s spotlight focuses on Kevin Wang, a contributor in the CNCF community since its beginning, leader of the cloud native open source team at Huawei, and co-founder of the KubeEdge and Volcano projects. Read the blog to learn more about Kevin’s experience with the CNCF community over the past five years.
- Kevin Wang, who is in the spotlight this time, is also challenging the CNCF TOC election. When I checked the GitHub page of TOC Elections for 2021 , the schedule was as follows, so the results will come out soon.
○ Election closes Feb 1, announced at noon
Tutorials, tools, and more that take you on a deep dive into the code.
Taneem Ibrahim, Red Hat
- As the title suggests, it describes the steps required to test Operator’s OLM (Operator Lifecycle Manager) integration. The demo uses a simple Operator that outputs test messages to the shell.
- The tools required for the local development environment used in this hands-on are as follows. It also offers the use of a free Red Hat Quay.io account.
○ Red Hat CodeReady Containers (CRC)
○ Podman , or a Docker daemon process running on the local machine
○ Operator SDK toolkit, v1.0.0 or higher (optional)
○ Operator Package Manager (OPM)
○ OpenShift Container Platform, cluster version 4.5 or higher
- It explains the contents of the title hands-on according to the official instructions. Although he was recommended to try Flux CD and there’s a good reference project initiated by his colleague: k8s-gitops , he wanted to fully understand how to use the Flux CD, so he chose to start from scratch with the above official instruction, but it didn’t take him long to fully enable GitOps on his cluster.
Saiyam Pathak, Civo
- A YouTube Webinar video explaining the content of the title.
- The speaker Saiyam Pathak energetically publishes to the live stream interview video of the events and Webinar video like this as good references as CNCF Ambassador, Director of Technical Evangelism at @civocloud. So I subscribe to his YouTube channel now.
Zhongkai Liu, Software Dev Engineer II & Palash Agrawal, Principal Software Dev Engineer
- It explains the differences between the previous and current DB Migration Process in Yahoo Sports. As a tool, they use Screwdriver, an open source CD(continuous delivery) platform .
- It makes sure that the term “migration” denotes any changes made to the database, including but not limited to inserting or deleting tables, populating data into, or removing entries from the database in this article.
- The author explains the use of Firecracker from the perspective of more DIY “I just want to run some VMs” perspective.
- Although Initially when she read about Firecracker being released, she thought it was just a tool for cloud providers to use, the following points that turns out are explained at the beginning.
○ Firecracker is relatively straightforward to use (or at least as straightforward as anything else that’s for running VMs)
○ The documentation and examples are pretty clear
○ You definitely don’t need to be a cloud provider to use it
○ As advertised, it starts VMs really fast!
- They share the findings of using Kubernetes as a very flexible platform to meet their researcher needs.
- The following two are listed as Unsolved problems, and it seems that migration work is underway for Metrics problems, and blog posts are planned for the results in the future.
○ Pod network traffic shaping
Jim Armstrong, Snyk
- It introduces the Docker Vulnerability Scanning CLI cheatsheet to help you start scanning container images using Docker Desktop and Snyk.
- As the title suggests, it explains how to create unit tests for Helm charts in Golang to keep quality high and make changes with confidence.
- “The Upsides” and “The downsides?” are summarized, and the author has written a full example with a basic helm chart in this repo.
Theo “Bob” Massard, particle.io
- It mentioned that AWS recently introduced their new solution to orchestrate Federated EKS clusters and explained from Kuberfed(Kubernetes Cluster Federation), which this solution is based on.
Ritesh Patel, Nirmata
- It explains how to enable developer self-service backups in Velero using the new CNCF sandbox project, Kyverno.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Jeremy Rickard, VMware and Kirsten Garrison, Red Hat
- The release team details Kubernetes 1.20 with new features and important deprecations.
- Kubernetes 1.20 is one of the largest releases with over 40 different enhancements.
This Week in Cloud Native: Cloud Native Infrastructure in the Data Center with Cluster API & Tinkerbell (CAPT) (livestream)
Jason DeTiberus, Equinix and Manny Mendez, Equinix
- Based on the following challenges, they explain how to use the Cluster API and Tinkerbell to introduce real cloud-native infrastructure management to your data center.
○ Up until now managing Kubernetes infrastructure outside of cloud providers has been difficult, and while there have been attempts to ease management of Kubernetes clusters within the data center previously we feel those attempts have been focused mostly on trying to shoehorn the management of clusters into legacy practices.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Craig Box, Kubernetes Podcast from Google
- Kubernetes Podcast by Google employees. The current Co-host is Craig Box. Adam Glick goes to greener pastures. Past guests will be invited as guest hosts for several weeks.
- This week ‘s guest host is Jasmine Jaksic, who is in charge of Istio and Anthos and Staff TPM & Manager at Google.
- They have Joshua Bernstein, who recently joined Google Cloud as a director of infrastructure modernization solutions, as a guest.
- The topics I was interested in in the News of the week are as follows.
○ New Google Cloud Run networking features
○ Kubernetes honey tokens by Brad Geesaman
○ Bad pods: privilege escalation by Seth Art
○ The US Air Force are feeling supersonic
Tobias Mann, SDxCentral
- I noticed a lot because I had no connection and contact with the content of this title. It was good to me to read the following points from this article.
○ He explained that Nvidia’s work in this arena has been somewhat drowned out by webscale applications which have been and remain the primary use case for Kubernetes. However, Lamb argues there is a huge potential for GPU-accelerated Kubernetes clusters in artificial intelligence (AI) workloads, an arena where Nvidia has long dominated.
○ Looking to the future, Lamb expects GPUs will begin to move into the mainstream of Kubernetes, especially as “AI serving becomes a GPU-accelerated workload, which is just at the inflection point of taking off.”
○ “As things expand, I think most people are going to be able to just think about GPU accelerated as a fast button or an efficient button and not have to think about GPU development or programming,” he added.
Rick Rackow, Red Hat
- It describes closed box monitoring and how OpenShift Dedicated SREs use it to complement the observable stack.
- As the title suggests, a CNCF article that guides the release of Vitess 9.
- In this article, it is explained as Major Themes in the following items.
○ Compatibility (MySQL, frameworks)
- Click here for Release Notes.
- An article reporting that the author has completed the program as Mentee of the Community Bridge Program of Keptn, a CNCF sandbox project .
- The program name has changed from “Community Bridge program” to “LFX Mentorship program”.
Electro Monkeys podcast
- The podcast is spoken in French. They talk about what a project like Thanos brings to Prometheus, how it works, and what that feature does.
- A 6-minute webinar video on YouTube that explains the principles of GitOps and its benefits.
Catherine Paganini, Buoyant and Jason Morgan, VMware
- from Cloud Native Computing Foundation Business Value Subcommittee co-chairs Catherine Paganini and Jason Morgan that focuses on explaining each category of the cloud native landscape to a non-technical audience as well as engineers just getting started with cloud native computing.
- This post is part of an ongoing series from “Cloud Native Computing Foundation Business Value Subcommittee” co-chairs Catherine Paganini and Mr. Jason Morgan on explaining each category of the cloud native landscape to a non-technical audience as well as engineers just getting started with cloud native computing.
- This post describes the Application Definition and the Development layer of cloud native landscape. The next article will focus on cloud-native platforms.
Chris Metinko, Crunchbase
- An article that describes investments and acquisitions in the Kubernetes ecosystem, which has already been in motion since the beginning of 2021, as well as future forecasts.
- One-page Survey of Google Forms. Check it out if you are using or interested in Linkerd.
Upcoming CNCF Online Programs
This Week in Cloud Native: Kubernetes Policies-as-code
Jim Bugwadia, Nirmata
February 3, 2021 at 11:00 am PT
CNCF On-demand Webinar: Policy as Code to Manage Security Rist in Kubernetes Before and After Deployment
Cesar Rodriguez, Accurics
February 4, 2021
For more information, please visit our updated Online Programs page.
- From the link above, the items “Future events”, “Past events”, and “Organizer” were created, and future events were included.
- Since a Group has been created in CNCF Online Programs, I have registered as the 8th member.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.