- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #528 February 7th, 2021
- The title is “How the Bottlerocket build system works” from the AWS Open Source Blog.
- It explains in detail the Bottlerocket OS created for the purpose of running containers that are OSS from AWS on VMs and bare metal.
- I felt nostalgic when I jumped to the Cargo page and saw the photo of the palletized cardboard. It reminds me of the days of the logistics industry, where I wrap and unravel.
- The title is “Defending software build pipelines from malicious attack”.
- Continuing from last week’s article “Securing the NCSC’s web platform,” an article from the UK’s National Cyber Security Center (NCSC).
- It explains why the build pipeline is one of the foundations of system security and why should give it particular attention, along with the following:
○ The benefits of automation
○ Defend the pipeline
○ Protect builds from each other
○ Establish a chain of custody
○ Consider a managed service for your build pipelines
○ Hard work, but worth the effort
- The title is “A visual language for digital integration”.
- It explains how to visually capture the right information (and only the right information) on one page when designing the integration for digital systems.
- In a future post, it will explore the process of determining components in more detail.
Dockerfiles are ubiquitous for building container images. But if you’re looking for something that provides a higher level interface and stronger opinions then buildpacks are worth a look. This post compares the two.
- The title is “Build packs vs Docker files”.
- The story of the author’s development team migrating from Dockerfile to buildpack. The following six perspectives are explained as factors that determined the transition.
○ Developer Productivity
○ Kubernetes Support
Threat modelling is a useful tool for getting people thinking about the security of their systems. It’s also a great way of encouraging collaboration between development and security teams. This new manifesto is a good starting point.
- The web page of “Threat Modeling Manifesto”.
- Threat Modeling is explained according to the following items.
○ What is threat modeling?
- The title is “Putting a VIP in your Kubernetes Clusters”.
- It touches on Tim Hockin ‘s “Bringing Traffic Into Your Kubernetes Cluster” and discusses Type：LoadBalancer(or in most cases, Virtual IP address) from a different perspective.
- The title is “Analyzing gRPC messages using Wireshark”.
- It explains how to configure and use the protocol-specific components “Wireshark gRPC dissector” and “Protocol Buffers (Protobuf) dissector” that allow Wireshark to analyze gRPC messages.
- The title is “Building a Kubernetes CI / CD Pipeline with GitLab and Helm”.
- Since It was covered in KubeWeekly #249 last week, I will skip it.
WTF is SRE? Container Solutions presents a new WTFinar that tackles the beginning of understanding SRE. Join Nathen Harvey from Google to learn about service level indicators (SLIs) and service level objectives (SLOs) — components of error budgets. 9th February, 15:00 CET
- An introductory article on the event “WTF is SRE?” featured in last week’s SRE Weekly Issue #255.
- A GitHub page of the operating system “Vorteil” for running cloud applications in micro virtual machines.
- A GitHub page of the mobile, desktop, and web app “kubenav” for managing Kubernetes clusters and getting an overview of resource status.
SRE Weekly Issue #256 February 7th, 2021
Here’s a blog post from Slack giving even more information about what went wrong on January 4. Bravo, Slack, there’s a lot in here for us to learn from.
Laura Nolan — Slack
- Regarding the outage, the article I covered in SRE Weekly Issue #254 the other day was a Slack report, but this one is in Slack’s engineering blog.
- As the Corrective Action, AWS assures Slack to review the AWS Transit Gateways (TGWs) scaling algorithms for large packet-per-second increases as part of their post-incident process and Slack set reminders to request preemptive upscaling of their TGW of the next holiday season, and more.
This academic paper from Facebook explains how they release code without disrupting active connections, even for a small number of users.
Usama Naseer, Luca Niccolini, Udip Pant, Alan Frindell, Ranjeeth Dasineni, and Theophilus A. Benson — Facebook
- An abstract page of Facebookpaper. You can download the paper from the link.
- It’s about Zero Downtime Release, a framework that leverages various components of the end-to-end network infrastructure to prevent or mask interruptions in the face of a release.
Another lesson we can learn from aviation: have one place where engineers can find out about temporary infrastructure changes that are important.
- It explores ways to communicate effectively with the entire team in a many-to-many condition where many environments are dealt as SRE with many other team members, and the temporary state of each environment. It uses the aviation term “NOTAM (Notices to Airmen)” as the keyword to explain the situation.
Coinbase posted this detailed analysis of their January 29th incident.
- It details the outage, explains what caused it, and describes changes to prevent similar failures in the future.
- They are working on changes such as reviewing monitoring, utilizing read-only replicas, and breaking down monolithic app servers into individual services.
Interesting thesis: a company moving into the cloud is in a unique position to adopt SRE practices — and better situated than cloud-first companies.
Tina Huang (CTO, transpose) — Forbes
- Along the title, it explains that there are two contrasting approaches that legacy software companies can adopt in their cloud strategies, including significantly different SRE outcomes.
- Adopt Cloud Services For Individual Services And Teams
- Build A Cloud Platform Team
We need to push past surface-level mitigation of an incident and really dig in and learn.
Darrell Pappa — Blameless
- The author, who heard the lines of the title from the person in charge of the customer consultation desk, suggested that SRE should be from the customer’s perspective as follows, and that the problems should be systematized and SRE best practices should be applied.
- Customers deserve better, and we should always be their biggest advocate. So, next time you find yourself saying, “Sorry, but I’m just doing my job,” try to shift your perspective to the customer. View these problems as systemic, use SRE best practices like SLOs and error budgets, and embrace a blameless culture to help make a change.
GitHub’s database failed in a manner that wasn’t detected by their automated failover system.
Keith Ballinger — GitHub
- It describes one incident and its countermeasures that caused a significant impact and reduced availability of the GitHub Actions service that occurred in January.
LinkedIn published their SRE training documentation in the form of a full curriculum covering a range of topics.
Akbar KM and Kalyanasundaram Somasundaram — LinkedIn
- Introducing the School of SRE, a curriculum curated for ambitious SREs published by LinkedIn on GitHub.
- It would be nice to have this kind of information in order to improve your skills as a team.
Your code may be designed to handle 64-bit integers, but what if a library (such as a JSON decoder) converts them to floating point numbers?
- It introduces how to search for bugs and play with JSON.
KubeWeekly #250 February 12th, 2021
Editor’s pick of the highlights from the past week.
KubeCon + CloudNativeCon Europe 2021 Virtual is happening on May 4–7, 2021. Be sure to register for a full All Access Pass for just $10 through February 14 at 23:59 CEST! The price will increase to $75 on February 15, so act fast to take advantage of this great deal.
Don’t forget — the CFP deadline for KubeCon + CloudNativeCon Europe 2021 Virtual co-located events closes on February 19!
See the full list of co-located events below:
Cloud Native Rust Day– hosted by CNCF — May 3
Cloud Native Security Day Europe — May 4
Cloud Native Wasm Day — May 4
FluentCon Cloud Native Logging day with Fluent Bit & Fluentd — May 4
Kubernetes AI Day — May 4
Kubernetes on Edge Day — May 4
ServiceMeshCon Europe — May 4
- KubeCon + CloudNativeCon Europe 2021 — Virtual and Co-located events scheduled to be held on May 4–7, 2021. All Access Pass for $ 10 is up to 2/14. I have already applied for it and I’m considering which one to apply for Co-located events.
Tutorials, tools, and more that take you on a deep dive into the code.
Pawan Shankar, Sysdig
- It describes what Kubernetes audit logs are, the information they provide, and how to integrate them with the open source runtime security tool “Falco” to detect suspicious activity in your cluster.
Ahmet Alp Balkan
- It explains how to build an OCI container image without using Docker by programmatically building the layers and image manifests using the go-containerregistry module.
Peter O’Neill, Ambassador Labs
- It explains how to use Skaffold to build and deploy a local environment, launch Telepresence, project the local services you are building to a remote cluster, and loop through development.
Ninad Desai, InfraCloud
- It touches on the need for Zero Trust Architecture and introduces “Teleport” as a product that fits into the area of ”Zero Trust Network” for cloud-native apps.
Levent Ogut, Loft
- The Readiness Probe and Liveness Probe, which it described in a previous post, mention that they behave differently and explain each component, configuration, and how to troubleshoot.
Saiyam Pathak, Civo
- It introduces an open source hyper-converged infrastructure (HCI) software running on Kubernetes. It is explained as an open source product alternative to products such as vSphere and Nutanix.
ICYMI: CNCF online programs this week
A weekly summary of CNCF online programs from this week.
Helen George and Joao Pereira @VMware
- It introduces Carvel, an open source project that provides a reliable, single-purpose, configurable set of tools to help you build, configure, and deploy your apps to Kubernetes.
- It shows how to take advantage of Carvel and explains how to use each tool individually or together.
Josh Hendrick @Rookout
- It describes what traditional challenges are when debugging Kubernetes-based apps, and how real-time debugging of production workloads can help solve them.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Matt Jarvis, Snyk
- It introduces Snyk’s Cloud native application security (CNAS) 2021 survey and shares plans for its report.
- The first 500 people will get free coffee, and the survey results will be released to the community free of charge, so if you are interested, please check it.
Thor Sigurdsson & Mike Winters, Garden
- It describes that “Most of the problems developers run into CI are caused by a) discrepancies between dev and CI environments and b) insufficient, slow integration testing.”, and one possible approach to solve these problems is to use a consistent configuration for every pre-production environment, from development to testing to CI.
In this context, it introduces “Garden” which uses a consistent configuration for all pre-production environments, from development to testing to CI.
- Garden is an open source project that describes the entire stack (all services, tests, dependencies) and allows you to launch an on-demand full stack environment at every step of your development pipeline.
- At the beginning, It explains how it impacts developer experience that Development environments use a completely separate (and often pared-down) configuration compared to CI, which builds, tests, and deploys in a more production-like setting.
○ First, the discrepancy between development and CI environments leads to hard-to-predict errors in CI.
○ Second, developers have no idea if integration tests will pass when they push to CI.
○ Third, the process of troubleshooting CI is slow and tedious.
○ And fourth, even just writing integration tests takes a lot of time and effort.
Upcoming CNCF Online Programs
CNCF Live webinar: Toward Hybrid Cloud Serverless Transparency with Lithops Framework
presented by IBM
February 16, 2021 at 10:00 am PT
This Week in Cloud Native (Livestream): KCD El Salvador
February 17, 2021 at 12:00 pm PT
CNCF Online Programs Playlist on YouTube
Check out our playlist for more curated content you don’t want to miss! New content is added every Friday.
- For more information, please visit our updated Online Programs page.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.