SRE / DevOps / Kubernetes Weekly Collection#39(Week 44)

Yoshiki Fujiwara
14 min readNov 2, 2020
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #513 October 25th, 2020
SRE Weekly Issue #241 October 25th, 2020
KubeWeekly # 239 October 31st, 2020

DEVOPS WEEKLY ISSUE #513 October 25th, 2020

News

A good introduction to ARM and the coming challenges, and advantages, of wider availability on laptops and servers of ARM-based architectures.

  • The title is “How to prepare for the coming CPU confusion “.
  • It describes what changes the arrival of ARM will bring to the software industry, why people are excited about it, why it is tricky, and most importantly, how to get ready to accept it.

A great talk on security theatre. Good quotes on helping shift responsibilities earlier in the development process and good observations about the changing role of security teams.

  • The title is “EXIT STAGE LEFT: Eradicating Security Theater”.
  • It was fun to read the beautiful slides and the scene developments and expressions such as “Fisticuffs”, “The Duel”, “Judgment”, “Redemption”, and “The Grande Finale”.
  • The closing words were also quaint.
    ○ “People don’t want their lives fixed. Nobody wants their problems solved. Their dramas. Their distractions. Their stories were resolved. Their messes cleaned up. Because what would they have left? Just the big scary unknown.” — Chuck Palahniuk

Slides from my talk last week at SnykCon, all about patterns for secure container base image management. Discussion of people, process and tools.

  • The title is “Patterns for secure container base image management”.
  • A slide from Gareth Rushgrove, the editor of this mailing list, at SnykCon. As mentioned above, I thought it would be nice to be able to imagine the actual business by focusing on people, processes, and tools.

A new Salt release is available, which means a super detailed post on the latest features and fixes from Salt Tips.

  • The title is “What’s New in Salt 3002 Magnesium”.
  • An unofficial summary of new features in the SaltMagnesium release. If you want to know about other changes and deprecations , it recommends reading the official release notes and changelog.

An interesting paper on quality metrics for infrastructure as code. Looking specifically at Ansible, but intended to be generalised.

  • The title is “Towards a Catalog of Software Quality Metrics for Infrastructure Code”.
  • It proposes a catalog of 46 metrics to focus and identify IaC properties, and how to use them to analyze IaC scripts in Ansible, one of the most popular IaC languages ​​to date.
  • Every time I meet such a certain amount of sentence, I want to improve my reading speed and accuracy, but I wonder if I can’t learn it unless I have to manage the numbers and hold down the points. There are more and more situations where I think that it is a realistic option to make it necessary to read the dissertation as oneself so that it is easy to acquire skills that are directly related to business, and to go to get a degree to get feedback.

Lots of people have written about tracing, but this post looks at some of the implementation challenges, from libraries and sampling to data transfer and storage.

  • The title is “Building Netflix’s Distributed Tracing Infrastructure”.
  • It is from Netflix Tech Blog. It describes how they designed the tracing infrastructure that powers Edgar, a streaming session troubleshooting tool introduced earlier in the same blog.
  • It is interesting to see tools such as Mantis developed by the company and open-sourced, and the storage that is evolving and optimizing every year.

A detailed walkthrough of using client certificates to authorize user access to a Kubernetes cluster.

  • The title is “Kubernetes Tips: Give Access To Your Cluster With A Client Certificate”.
  • It shows an easy way to authenticate a user to Kubernetes clusters using client certificates. This article itself is on 2019/06/08, so more than a year ago. Posting seems to have stopped for the past few months, but past articles such as Docker / Kubernetes / NAT are familiar and helpful.

As an alternative to using cron for recurring jobs on Linux machines you can also use Systemd Timers.

  • The title is “Schedule jobs with systemd timers, a cron alternative”.
  • The content is as the title. I often edit and check existing ones, and I want to make hands-on by myself, so bookmarked. There are a lot of things I should dig into in the infrastructure layer to deepen your understanding (I need my replicasets).

Events

The Software Circus Virtual Fest is back next week, Thursday 29th October, this time with a spooky twist! Join the Circus spirits on a 10-hour journey through your worst Cloud Native nightmares, wear your scariest costume and learn how to bring your projects back from the dead.

This is a free Community event, with a chilling line-up that includes Ian Coldwater, Kris Nova, Bryan Cantrill, Joe Beda and many more monsters. Check out the schedule and register today!

  • Information on “The Software Circus Virtual Fest” that was held on October 29th. Continuing from the last time, a free event.
  • As I saw last time, it was fun to wear interesting costumes, both virtual and real.

Tools

Akri lets you easily expose heterogeneous devices (such as IP cameras and USB devices) as resources in a Kubernetes cluster, while also supporting the exposure of embedded hardware resources such as GPUs and FPGAs.

  • The GitHub page of Akri, a tool that continuously detects nodes that can access different types of leaf devices (such as IP cameras and USB devices) and schedules workloads based on them.
  • It easily exposes different types of leaf devices as resources within a Kubernetes cluster, and also supports publishing embedded hardware resources such as GPUs and FPGAs.

SRE Weekly Issue #241 October 25th, 2020

Articles

Addendum

A quick note on last week’s issue: Google posted an updated version of their Google Chat incident summary with the “confidential” language removed. They also updated the content at the original link.

  • The editor mentioned in last week’s article that Google had published a document with “Important: Google Customer Confidential — Not for publication or distribution”, but the wording “confidential” was removed and the original It tells us that the content of the link was also updated.

June 15, 2020 T-Mobile Network Outage Report

T-Mobile, one of the main mobile phone carriers in the US, had a major outage earlier this year. This report is essentially a retrospective performed by the US FCC (Federal Communications Commission). The report details the satisfyingly complex interplay of contributing factors in the incident.

US Federal Communications Commission

  • Report of a failure on T-Mobile’s wireless network at noon on 06/15/2020. The failure lasted more than 12 hours, affecting national calling and text messaging services, including 911 services.
  • The explanation and analysis of the architecture are well written. When the example of OSPF appeared as a protocol, there is no doubt that the work experience at the ISP had an effect on me to question the weight of the diagram.

Failing over with falling over

How can you be sure your failover plan will actually work? Hint: it’s almost certainly not going to work properly the first time you try it.

Adrian Cockcroft

  • The speaker gives explanations according to the title under the theme of “resilient system” based on his own work experience, achievements, and knowledge.
  • The following “The characteristics of a resilient system can be divided into four layers” is important for understanding the overall points, so it is excerpted below.
  1. Experienced staff — Use “game days” to understand how the system behaves when it’s managing failures, and know how to quickly observe and control problems.
  2. Robust applications — Have been tested using fault injection and chaos testing tools.
  3. Dependable switching fabric — An application framework that compensates for faults by routing around them
  4. Redundant service foundation — Redundant automated services that carefully maintain isolation so that failures are independent

3 Ways SRE Can Boost your Business Value

In this blog post, we’ll look at the business value of SRE through customer focus, observability, and efficiency.

Emily Arnott — Blameless

  • At the beginning, it mentions the need to show the value of SREs and to cite details to prove that SREs are profitable in order for organizations to gain support for moving into the world of SREs. Then, the business value of SRE is explained in the following three items, focusing on customer focus, observability, and efficiency.
  1. SRE transforms how we understand customer satisfaction
  2. SRE makes business value more observable
  3. SRE minimizes value lost from incidents

Building Netflix’s Distributed Tracing Infrastructure

Netflix has some interesting ideas around sampling, performance, and storage for their tracing system.

Maulik Pandey — Netflix

  • I mentioned it in DEVOPS WEEKLY ISSUE # 513 above, so I will skip it.

10 Days of Errors

Oh, I do0 love reading stories of systems failing in interesting ways. This first installment contains five of the 10.

Summer Grahame — LaunchDarkly

  • A project where you can enjoy a cool feeling in the style of a ghost story about a 10-day error. By the way, it was the Halloween season.

Preparing for peak holiday shopping in 2020: War rooms go virtual

Black Friday is coming. Here are some ideas on how to deal with the rush — and how to analyze how you dealt with it when it’s over.

Nelly Wilson — Google

  • Introducing GCP’s commitment to Black Friday / Cyber ​​Monday (BFCM), adopters, and best practices for these important peak events. The explanation is based on the following three steps and the items of each step.
    ○ Step 1: Preparing for the event
    ○ Step 2: During the event
    ○ Step 3: Post event

The Chaos Engineering Book

Two of my favorite authors/speakers have conspired to create a book on one of my favorite topics. Take my money! Oh wait, they’re giving it away, too?!

Nora Jones and Casey Rosenthal

  • Click here for a free e-BOOK. You can also follow from the link above. If you enter the required information (name, affiliation, job title, email address), you can get the download page by automatic email.
  • The points of chaos engineering are described below. The two authors have covered it several times in this blog as well. In podcasts and articles.
  • The point of Chaos Engineering isn’t to create chaos; it’s to chart a path of confidence through the chaos.

Outages

KubeWeekly # 239

The Headlines

Editor’s pick of the highlights from the past week.

Service mesh is still hard

Service mesh is more mature than it was one or two years ago, but it’s still hard for users. In this post, Lin Sun from IBM outlines the reasons she thinks that service mesh is still difficult, and how they can be mitigated. She looks forward to observing innovations across all mesh projects as their authors work hard to make service mesh as boring but useful as possible.

  • An article guest-posted to CNCF as a sponsor of KubeCon + CloudNativeCon NA Virtual. The title given at ServiceMeshCon EU in August is explained in the following five points.
  1. Lack of clear guidance on whether you need service mesh
  2. Your service may break immediately after a sidecar is injected
  3. Your service may have odd behavior at start or stop time
  4. Zero configuration for your service is possible but zero code change is not
  5. Service owner need to understand nuances of client and service side configurations

Join us for KubeCon + CloudNativeCon North America Virtual 2020!

The countdown to KubeCon + CloudNativeCon North America is on. Have you reserved your spot?

Based on the community feedback from KubeCon + CloudNativeCon Europe 2020 — Virtual, we are bringing back the 101 track designed for first-time attendees — now expanded to include more sessions and tutorials! The 101 track is perfect for beginners to learn something new, share best practices, and catch a glimpse of interesting use cases.

Don’t forget that special pricing is available through October 31, 2020, a savings of $25 off registration. Don’t delay — act fast!

  • Continuing from last week, KubeCon + CloudNativeCon North America is approaching, and the discount for paid participation tickets was until the end of October, so they reminded us again.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: The truth about the service mesh data plane

Denis Jannot, Director of Field Engineering @Solo.io

  • To answer the following serious questions that arise when investigating service meshes, it describes the role of the data plane and how to choose the right components for the context of the problem.
    ○ What data plane should I use?
    ○How does this tie in with my existing API infrastructure?
    ○ What kind of overhead do sidecar proxies demand?

CNCF Member webinar: Admission controllers: one part of your Kubernetes security and governance toolkit

Gunjan Patel, Cloud Architect @Palo Alto Networks & Robert Haynes, Cloud Security Evangelist @Palo Alto Networks

  • It outlines the Kubernetes Admission Controller architecture, and in particular, along with the relevant Open Policy Agent and Rego language components, describes the validation features of the Admission Controller so that you can understand the following at the end:
    ○ An overview of object creation in Kubernetes
    ○ The basics of the Rego language (for writing admission controller policies)
    ○ Sample admission controller policies for security and IT governance

CNCF Member webinar: Event-driven architecture with Knative events

Nicolás López, Senior Software Engineer @Google & Bryan Zimmerman, Product Manager @Google

  • Looking back on the evolution from monoliths to microservices to event-driven architecture.
  • It explains about cloud events, how to use Knative Events as an intermediary for events, understanding Knative components, and the extensibility of the Operator model (Sources, Brokers).
  • There is a demo showing autoscaling using EventSourcing, Custom Events, and Serving.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Service mesh era: Building modern apps with YugabyteDB and Istio

Chirag Narang, Yugabyte

  • At the beginning, it said, “One of the most common security approaches is to set up mTLS. While this is an important security tool, it’s often difficult and time consuming to manage. “.
  • To solve the above problem, he explained a tutorial focusing on how to deploy Yugabyte DB with Istio mTLS to protect communication between services.

Container networking is simple

Ivan Velichko

  • I covered it in the same place last week, so I will skip it.

Building Kubernetes native SaaS applications: iterating quickly by deploying in-cluster data planes

Pixie Labs

  • This is the first post in a series of posts discussing techniques and best practices for effectively building Kubernetes native apps.
  • This time, it explains the trade-offs between using an air-gapped deployment that exists completely within the cluster and using a system that splits the control plane and data plane between the cloud and the cluster, respectively.

Building Kubernetes Operator from scratch using operator-sdk(1.1.0)

Saiyam Pathak, Civo Ramiro Berelleza, Oct.

  • A YouTube video that explains the need for Kubernetes Operator along with the title. There are some differences between the above title and the Youtube title. It was refreshing for me to abbreviate Kubernetes Operators as KO.

GitOps using Red Hat OpenShift pipelines (Tekton) and Red Hat Advanced Cluster Management to deploy on multiple clusters

Red Hat, Giovanni Fontana

  • A series of articles introducing Red Hat’s recently released “Advanced Cluster Management tool” aimed at overcoming the challenge of organizations deploying apps to multiple clusters and clouds.
  • Click here for the previous article. It explained how to use Tekton and Red Hat Advanced Cluster Management to deploy applications to multiple namespaces, one for each lifecycle environment (Dev, QA, and Prod), using a single OpenShift managed cluster.
  • This article extends the use case to deploy the same app as last time, using three different clusters.

Introducing KubeLinter — an open source linter for Kubernetes

Viswajith Venugopal, StackRox

  • StackRox’s Web page introduces the new OSS “KubeLinter” from the company. KubeLinter is a static analysis tool that checks Kubernetes YAML files and Helm charts to ensure that the apps represented in them comply with best practices. A demo video of about 5 minutes is embedded.
  • Click here for the GitHub page.

Helm Project Update: New Location For Stable and Incubator Charts

Matt Farina, Helm maintainer

  • An article that touches on the recent one, confirms that the Helm project’s “ Stable “ and “ Incubator “ chart repositories have been moved to a new location, and directs how to use them.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Pop Punk to Pods, with David Pait

Craig Box and Adam Glick, Kubernetes Podcast from Google

  • Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
  • Former pop punk band “Sparks The Rescue” tour musician (bassist), now ad tech company “ Netsertive “ SRE, Helm contributor David Pait was invited as a guest, and asked many things like “How did you get there?””And if you’re looking to change careers, how might you?”.
  • It was interesting to hear about the introduction of Kubernetes, the use of Velero / Rancher, and the transition to EKS.
  • The topics I was interested in in the News of the week are as follows.
    kube-secret-syncer from Contentful
    OpenTelemetry Tracing Spec RC by Morgan McLean
    AWS Distro for OpenTelemetry
    Verizon Business adds Kubernetes which is powered by Rafay

Web scraping that just works with OpenFaaS with Puppeteer

Alex Ellis

  • Alex Ellis, Founder and CNCF Ambassador of OpenFaas, introduces Puppeteer and explains how to use Puppeteer to automate and scrape websites using OpenFaaS functions.

Preparing Google Cloud deployments for Docker Hub pull request limits

Michael Winser and Dhaivat Pandit, Google Cloud

  • It introduced how to scan your codebase and workload for container image dependencies from third-party container registries such as Docker Hub, touching on rate limiting for the number of pull requests to services by “Free Plan” users announced by Docker.

Announcing the Linkerd Community Anchor Program

Thomas Rampelberg

  • A CNCF article introducing the Linkerd Community Anchor program.
  • A program aimed at promoting Linkerd’s stories and experiences. It will support complex use cases, aha experiences, stories about Linkerd combined with new tools, and the process shared by the Linkerd team. If you are interested, please check it out.

How to integrate virtual machines into Istio service mesh

Jimmy Song, Tetrate

  • An overview of Istio and the following sections explain why Istio needs to be integrated with virtual machines and how.
    ○ Why Should Istio Support Virtual Machines?
    ○ What Is Needed to Add VMs to the Mesh?
    ○ How Does Istio Support Virtual Machines?

Announcing Vitess 8

Vitess maintainers

  • A CNCF article guiding the release of Vitess 8.
  • The following points are highlighted. Click here for release notes.
    ○ Compatibility (MySQL, frameworks)
    ○ Migration
    ○ Usability
    ○ Innovation

Kubernetes project survey

Lero

  • Probably a wrong link or authorization matter. Some error messages, “The file cannot be opened at this time.” “Please check the address and try again.” displayed (in Japanese for my browser)→ I reported it to the Twitter account, so I hope it will be solved soon.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Managing your policies and standards
Ahmed Badran, Chief Technology Officer @Magalix
Nov 4, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Security in the world of service meshes
John A. Joyce, Principal Engineer @Cisco
Nov 4, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: Building edge as a service
Dr. Bin Ni, CTO @Wangsu Science & Technology / CDNetworks
Nov 5, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Kubernetes in the context of on-premises edge and network edge computing
Amr Mokhtar, Network Software Engineer @Intel Corporation
Nov 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: MicroK8s HA under the hood: Kubernetes with Dqlite
Konstantinos Tsakalozos, Senior Software Engineer @Canonica
Nov 11, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: The what and why of distributed tracing
Dave McAllister, Sr. Technical Evangelist @Splunk
Nov 13, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.