SRE / DevOps / Kubernetes Weekly Collection#39(Week 44)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #513 October 25th, 2020
SRE Weekly Issue #241 October 25th, 2020
KubeWeekly # 239 October 31st, 2020

DEVOPS WEEKLY ISSUE #513 October 25th, 2020

News

A good introduction to ARM and the coming challenges, and advantages, of wider availability on laptops and servers of ARM-based architectures.

  • The title is “How to prepare for the coming CPU confusion “.

A great talk on security theatre. Good quotes on helping shift responsibilities earlier in the development process and good observations about the changing role of security teams.

  • The title is “EXIT STAGE LEFT: Eradicating Security Theater”.

Slides from my talk last week at SnykCon, all about patterns for secure container base image management. Discussion of people, process and tools.

  • The title is “Patterns for secure container base image management”.

A new Salt release is available, which means a super detailed post on the latest features and fixes from Salt Tips.

  • The title is “What’s New in Salt 3002 Magnesium”.

An interesting paper on quality metrics for infrastructure as code. Looking specifically at Ansible, but intended to be generalised.

  • The title is “Towards a Catalog of Software Quality Metrics for Infrastructure Code”.

Lots of people have written about tracing, but this post looks at some of the implementation challenges, from libraries and sampling to data transfer and storage.

  • The title is “Building Netflix’s Distributed Tracing Infrastructure”.

A detailed walkthrough of using client certificates to authorize user access to a Kubernetes cluster.

  • The title is “Kubernetes Tips: Give Access To Your Cluster With A Client Certificate”.

As an alternative to using cron for recurring jobs on Linux machines you can also use Systemd Timers.

  • The title is “Schedule jobs with systemd timers, a cron alternative”.

Events

The Software Circus Virtual Fest is back next week, Thursday 29th October, this time with a spooky twist! Join the Circus spirits on a 10-hour journey through your worst Cloud Native nightmares, wear your scariest costume and learn how to bring your projects back from the dead.

This is a free Community event, with a chilling line-up that includes Ian Coldwater, Kris Nova, Bryan Cantrill, Joe Beda and many more monsters. Check out the schedule and register today!

  • Information on “The Software Circus Virtual Fest” that was held on October 29th. Continuing from the last time, a free event.

Tools

Akri lets you easily expose heterogeneous devices (such as IP cameras and USB devices) as resources in a Kubernetes cluster, while also supporting the exposure of embedded hardware resources such as GPUs and FPGAs.

  • The GitHub page of Akri, a tool that continuously detects nodes that can access different types of leaf devices (such as IP cameras and USB devices) and schedules workloads based on them.

SRE Weekly Issue #241 October 25th, 2020

Articles

Addendum

A quick note on last week’s issue: Google posted an updated version of their Google Chat incident summary with the “confidential” language removed. They also updated the content at the original link.

  • The editor mentioned in last week’s article that Google had published a document with “Important: Google Customer Confidential — Not for publication or distribution”, but the wording “confidential” was removed and the original It tells us that the content of the link was also updated.

June 15, 2020 T-Mobile Network Outage Report

T-Mobile, one of the main mobile phone carriers in the US, had a major outage earlier this year. This report is essentially a retrospective performed by the US FCC (Federal Communications Commission). The report details the satisfyingly complex interplay of contributing factors in the incident.

US Federal Communications Commission

  • Report of a failure on T-Mobile’s wireless network at noon on 06/15/2020. The failure lasted more than 12 hours, affecting national calling and text messaging services, including 911 services.

Failing over with falling over

How can you be sure your failover plan will actually work? Hint: it’s almost certainly not going to work properly the first time you try it.

Adrian Cockcroft

  • The speaker gives explanations according to the title under the theme of “resilient system” based on his own work experience, achievements, and knowledge.
  1. Experienced staff — Use “game days” to understand how the system behaves when it’s managing failures, and know how to quickly observe and control problems.

3 Ways SRE Can Boost your Business Value

In this blog post, we’ll look at the business value of SRE through customer focus, observability, and efficiency.

Emily Arnott — Blameless

  • At the beginning, it mentions the need to show the value of SREs and to cite details to prove that SREs are profitable in order for organizations to gain support for moving into the world of SREs. Then, the business value of SRE is explained in the following three items, focusing on customer focus, observability, and efficiency.
  1. SRE transforms how we understand customer satisfaction

Building Netflix’s Distributed Tracing Infrastructure

Netflix has some interesting ideas around sampling, performance, and storage for their tracing system.

Maulik Pandey — Netflix

  • I mentioned it in DEVOPS WEEKLY ISSUE # 513 above, so I will skip it.

10 Days of Errors

Oh, I do0 love reading stories of systems failing in interesting ways. This first installment contains five of the 10.

Summer Grahame — LaunchDarkly

  • A project where you can enjoy a cool feeling in the style of a ghost story about a 10-day error. By the way, it was the Halloween season.

Preparing for peak holiday shopping in 2020: War rooms go virtual

Black Friday is coming. Here are some ideas on how to deal with the rush — and how to analyze how you dealt with it when it’s over.

Nelly Wilson — Google

  • Introducing GCP’s commitment to Black Friday / Cyber ​​Monday (BFCM), adopters, and best practices for these important peak events. The explanation is based on the following three steps and the items of each step.
    ○ Step 1: Preparing for the event
    ○ Step 2: During the event
    ○ Step 3: Post event

The Chaos Engineering Book

Two of my favorite authors/speakers have conspired to create a book on one of my favorite topics. Take my money! Oh wait, they’re giving it away, too?!

Nora Jones and Casey Rosenthal

  • Click here for a free e-BOOK. You can also follow from the link above. If you enter the required information (name, affiliation, job title, email address), you can get the download page by automatic email.

Outages

KubeWeekly # 239

The Headlines

Editor’s pick of the highlights from the past week.

Service mesh is still hard

Service mesh is more mature than it was one or two years ago, but it’s still hard for users. In this post, Lin Sun from IBM outlines the reasons she thinks that service mesh is still difficult, and how they can be mitigated. She looks forward to observing innovations across all mesh projects as their authors work hard to make service mesh as boring but useful as possible.

  • An article guest-posted to CNCF as a sponsor of KubeCon + CloudNativeCon NA Virtual. The title given at ServiceMeshCon EU in August is explained in the following five points.
  1. Lack of clear guidance on whether you need service mesh

Join us for KubeCon + CloudNativeCon North America Virtual 2020!

The countdown to KubeCon + CloudNativeCon North America is on. Have you reserved your spot?

Based on the community feedback from KubeCon + CloudNativeCon Europe 2020 — Virtual, we are bringing back the 101 track designed for first-time attendees — now expanded to include more sessions and tutorials! The 101 track is perfect for beginners to learn something new, share best practices, and catch a glimpse of interesting use cases.

Don’t forget that special pricing is available through October 31, 2020, a savings of $25 off registration. Don’t delay — act fast!

  • Continuing from last week, KubeCon + CloudNativeCon North America is approaching, and the discount for paid participation tickets was until the end of October, so they reminded us again.

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Member webinar: The truth about the service mesh data plane

Denis Jannot, Director of Field Engineering @Solo.io

  • To answer the following serious questions that arise when investigating service meshes, it describes the role of the data plane and how to choose the right components for the context of the problem.
    ○ What data plane should I use?
    ○How does this tie in with my existing API infrastructure?
    ○ What kind of overhead do sidecar proxies demand?

CNCF Member webinar: Admission controllers: one part of your Kubernetes security and governance toolkit

Gunjan Patel, Cloud Architect @Palo Alto Networks & Robert Haynes, Cloud Security Evangelist @Palo Alto Networks

  • It outlines the Kubernetes Admission Controller architecture, and in particular, along with the relevant Open Policy Agent and Rego language components, describes the validation features of the Admission Controller so that you can understand the following at the end:
    ○ An overview of object creation in Kubernetes
    ○ The basics of the Rego language (for writing admission controller policies)
    ○ Sample admission controller policies for security and IT governance

CNCF Member webinar: Event-driven architecture with Knative events

Nicolás López, Senior Software Engineer @Google & Bryan Zimmerman, Product Manager @Google

  • Looking back on the evolution from monoliths to microservices to event-driven architecture.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Service mesh era: Building modern apps with YugabyteDB and Istio

Chirag Narang, Yugabyte

  • At the beginning, it said, “One of the most common security approaches is to set up mTLS. While this is an important security tool, it’s often difficult and time consuming to manage. “.

Container networking is simple

Ivan Velichko

  • I covered it in the same place last week, so I will skip it.

Building Kubernetes native SaaS applications: iterating quickly by deploying in-cluster data planes

Pixie Labs

  • This is the first post in a series of posts discussing techniques and best practices for effectively building Kubernetes native apps.

Building Kubernetes Operator from scratch using operator-sdk(1.1.0)

Saiyam Pathak, Civo Ramiro Berelleza, Oct.

  • A YouTube video that explains the need for Kubernetes Operator along with the title. There are some differences between the above title and the Youtube title. It was refreshing for me to abbreviate Kubernetes Operators as KO.

GitOps using Red Hat OpenShift pipelines (Tekton) and Red Hat Advanced Cluster Management to deploy on multiple clusters

Red Hat, Giovanni Fontana

  • A series of articles introducing Red Hat’s recently released “Advanced Cluster Management tool” aimed at overcoming the challenge of organizations deploying apps to multiple clusters and clouds.

Introducing KubeLinter — an open source linter for Kubernetes

Viswajith Venugopal, StackRox

  • StackRox’s Web page introduces the new OSS “KubeLinter” from the company. KubeLinter is a static analysis tool that checks Kubernetes YAML files and Helm charts to ensure that the apps represented in them comply with best practices. A demo video of about 5 minutes is embedded.

Helm Project Update: New Location For Stable and Incubator Charts

Matt Farina, Helm maintainer

  • An article that touches on the recent one, confirms that the Helm project’s “ Stable “ and “ Incubator “ chart repositories have been moved to a new location, and directs how to use them.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Pop Punk to Pods, with David Pait

Craig Box and Adam Glick, Kubernetes Podcast from Google

  • Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.

Web scraping that just works with OpenFaaS with Puppeteer

Alex Ellis

  • Alex Ellis, Founder and CNCF Ambassador of OpenFaas, introduces Puppeteer and explains how to use Puppeteer to automate and scrape websites using OpenFaaS functions.

Preparing Google Cloud deployments for Docker Hub pull request limits

Michael Winser and Dhaivat Pandit, Google Cloud

  • It introduced how to scan your codebase and workload for container image dependencies from third-party container registries such as Docker Hub, touching on rate limiting for the number of pull requests to services by “Free Plan” users announced by Docker.

Announcing the Linkerd Community Anchor Program

Thomas Rampelberg

  • A CNCF article introducing the Linkerd Community Anchor program.

How to integrate virtual machines into Istio service mesh

Jimmy Song, Tetrate

  • An overview of Istio and the following sections explain why Istio needs to be integrated with virtual machines and how.
    ○ Why Should Istio Support Virtual Machines?
    ○ What Is Needed to Add VMs to the Mesh?
    ○ How Does Istio Support Virtual Machines?

Announcing Vitess 8

Vitess maintainers

  • A CNCF article guiding the release of Vitess 8.

Kubernetes project survey

Lero

  • Probably a wrong link or authorization matter. Some error messages, “The file cannot be opened at this time.” “Please check the address and try again.” displayed (in Japanese for my browser)→ I reported it to the Twitter account, so I hope it will be solved soon.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Managing your policies and standards
Ahmed Badran, Chief Technology Officer @Magalix
Nov 4, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Security in the world of service meshes
John A. Joyce, Principal Engineer @Cisco
Nov 4, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: Building edge as a service
Dr. Bin Ni, CTO @Wangsu Science & Technology / CDNetworks
Nov 5, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Kubernetes in the context of on-premises edge and network edge computing
Amr Mokhtar, Network Software Engineer @Intel Corporation
Nov 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: MicroK8s HA under the hood: Kubernetes with Dqlite
Konstantinos Tsakalozos, Senior Software Engineer @Canonica
Nov 11, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: The what and why of distributed tracing
Dave McAllister, Sr. Technical Evangelist @Splunk
Nov 13, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.
Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store