SRE / DevOps / Kubernetes Weekly Collection#18(Week 23)

Yoshiki Fujiwara
16 min readJul 21, 2020
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #492 May 31st, 2020
SRE Weekly Issue #221 May 31st, 2020
KubeWeekly #219 June 5th, 2020

DEVOPS WEEKLY ISSUE #492 May 31st, 2020

News

Threat modelling is a powerful technique for understanding how your service might be attacked. This detailed post services as a great introduction to software developers who might not have a string security background.

  • The title is “A Guide to Threat Modeling for Developers”.
  • An article that provides clear and easy steps for teams who want to adopt a “threat model” that is a type of risk management.
  • You can get a concrete example of how to proceed with the session, points of methods such as brainstorming, and concrete examples. Since it is written comprehensively, I feel that there is so much volume to read quickly, but at the stage of putting it into practice, I will appreciate that it is written in such detail.

A lot can happen in a large cloud environment, that’s where audit logs come in. This post explains how to configure, collect and use audit logs in GCP.

  • The title is “Best practices for monitoring GCP audit logs”.
  • An article that Datadog introduces best practices for monitoring GCP Audit logs.
  • A good article that explains the value of GCP’s Audit logs and how to use them, as well as guides you on the added value provided by your company’s services. Every time I take up here, I think the Datadog article is easy to read and full of information.

Slides from a talk I gave this last week at DockerCon, talking about reusable Dockerfiles and using GitHub Actions to create a fast packaging pipeline for lots of images.

  • The title is “Building a Docker Image Packaging Pipeline Using GitHub Actions”.
  • A slide from “I’m uploading to Speakerdeck” at DockerCon, “A story about creating a fast packaged pipeline for large numbers of images with reusable Dockerfiles and GitHub Actions.” It’s very easy to read and you can learn about the operation of Dockerfile.
  • The following three messages from the author are used to close the slides.
  1. Learn Dockerfile
  2. Bring automation closer to developers
  3. Focus on maintenance

A look at building a deployment process around Kubernetes, using Helm and ArgoCD.

  • The title is “Git Ops Deployment and Kubernetes”.
  • I will skip it because I checked it last week.

A review of Rejoiner, a tool looking to bridge gRPC APIs with GraphQL. Good discussion of both pros and cons.

  • The title is “Review of Rejoiner”.
  • An article reviewing the good and bad points I think of Rejoiner, a tool developed by Google that generates an integrated GraphQL schema for gRPC microservices.

A useful checklist for backend applications, covering networking, monitoring, logging, backups, secrets and more.

  • The title is “Production readiness checklist for backend applications”.
  • The author has published a checklist that he created to apply the information such as the experience of releasing the application to the production environment, the book that describes the best practices, etc.

A quick introduction to enforcing policy in a Kubernetes cluster using Open Policy Agent.

  • The title is “Kubernetes Pod Security Policies with Open Policy Agent”.
  • After touching on the overview of “containers and security”, discussing “Pod Security Policy Admission Controller”, the author explained how to implement Pod security policy with OPA (Open Policy Agent).

Jobs

King is looking for new members for the infrastructure engineering teams to help develop, manage and expand our software based networking setup across datacenters and (Google) cloud. Please take a look at the open role for networking engineer. We’re also still looking for both database and streaming data engineers, if that is more your style.

  • Continued job information from King. There seemed to be no change in posts from the week before last(at that moment). It seemed that they were looking for SRE , Database SRE , Network SRE .

Tools

With the release of Server Side Apply in Kubernetes 1.16 we’re likely to see more client tools, and the new Terraform provider for Kubernetes looks like a good start. Some tooling as well for converting YAML into HCL to make migration easier.

SRE Weekly Issue #221 May 31st, 2020

Articles

Chaos Engineering and Continuous Verification in Production

Casey Rosenthal tips over a herd of sacred cows with this talk that opens with 6 myths about reliable systems.

Casey Rosenthal — Verica

  • Full text and embedded video from the event “ TIP on Twitch-Chaos Engineering and Continuous Verification in Production “ hosted by the meetup group “ Bay Area Test in Production Meetup “ on April 30th .
  • He was able to give an easy-to-understand and satisfying explanation such as “The myths of reliability”.
  • I thought that it was distributing the English “Chaos Engineering” e-book for free, so there were conditions. After registering his/her name, affiliation, and email address, “the first 100 people who had a video chat for 15 minutes” will get it. I gave up because I am not confident that I could have a meaningful video chat for the interviewer. If you have any useful feedback, please click here.

How to Talk About Software at Scale

This is written as talking about scale during a job interview, and it’s a pretty good read even if you’re not interviewing right now.

Denise Yu

  • It explained that “How senior-level engineers who do not have development experience in large-scale environments such as Netflix, Amazon, Shopify, GitHub can handle interviews of such large companies with no code interview with 4 points.

Asking the right “why” questions

John Allspaw says we should ask “how”, not “why”. Hollnagel and Woods say that finding out why a joint cognitive system does what it does rather than how. Who’s right?

Lorin Hochstein

  • An article that explains “Asking the right “why” questions”, with a theme similar to “ Why you can’t just ask “why” introduced on this blog the other day.
  • The comparison between CSE (cognitive systems engineering) and RCA (root cause analysis) was easy to understand.

May 2020 | The Post-Incident Review

Yay, another issue! This one revolves around learning from incidents from organizations in other fields (Bose and NASA).

Jaime Woo and Emil Stolarsky — Incident Labs

  • The main articles are Bose and NASA-focused curiosity. It’s suitable for me to read the NASA article and put myself in an environment where curiosity is put to good use, and I feel fulfilled because I have done such work, including this blog. I still feel overwhelmingly lacking in terms of skills, so I will fill the gap accordingly.
  • The illustration is as cute as ever.

Google Cloud Issue Summary — Hangouts Classic — 2020–05–19

This is a followup analysis of a Google Hangouts outage from last month.

Google

  • 5/19 08:10 ~ 09:23 (PDT) Google Hangout chat function summary that occurred.
  • It mainly describes the root cause, recovery measures, and recurrence prevention measures.
  • Personally, the work of ACL (Access Control List) is scary. Especially, the difference can not be confirmed before applying the policy (if my memory is correct, even if the difference can be confirmed in the config, I could not see what happens when the changed policy is applied to real traffic before applying it).
  • If you can check the difference, I don’t think Google’s system and engineers can apply it by mistake, so I wonder if ACL has no such tool or it is not popular. (Because applying the policy change itself is a CPU-intensive task, taking the data with both the new policy and the existing policy may require the specifications and resources of the network device.)

Outages

KubeWeekly #219 June 5th, 2020

The Headlines

Editor’s pick of the highlights from the past week.

Welcome, Priyanka Sharma to CNCF!

Priyanka Sharma takes over the leadership of CNCF from Dan Kohn, who will be launching a new Linux Foundation initiative to help public health authorities use open source software to fight COVID-19 and other epidemics. Dan will continue to participate in several CNCF activities and CNCF’s CTO Chris Aniszczyk will continue in his role. We are so excited to welcome Priyanka!

From Priyanka: “As an early member of the cloud native community, I have witnessed first hand the profound impact of CNCF. As the engine for the multi-billion dollar DevOps movement, it enables organizations to ship software faster and with greater resiliency.”

  • As an expert of cloud-native and observability, Priyanka Sharma has been active as a contributor and speaker to a number of CNCF projects and has been coordinating with CNCF executives/community leaders over the last few years.
  • Check out the podcast with Priyanka Sharma as a guest on this news. “ Meet the New Boss of the Cloud Native Computing Foundation “.
  • Dan Kohn will launch a new Linux Foundation initiative, General Manager of LF Public Health, to help public health organizations fight COVID-19 and other infectious diseases using OSS. Continue to participate in some activities of CNCF.

New Linux Foundation/CNCF Cloud Engineer Bootcamp

This new program will prepare an absolute beginner to learn the most in-demand cloud computing skills in about 6 months with a mix of online training, live instructor support, and performance-based certification exams, including the CKA. Introductory pricing saves $400 through June 17!

  • CNCF announces a new training program. A program in which a complete beginner can learn cloud computing in 6 months. In addition to online self-study, you can receive support through forums and video chats. CKA (Certified Kubernetes Administrator) and LFCS (Linux Foundation Certified Sysadmin) exams are also included.
  • At that time, you could buy it for $400 (until 6/17). The regular price is $599.
  • It’s interesting that the copy claims the value of American engineers in the market. “Launch a new, successful career as a cloud engineer, with a median salary close to $150,000”’.

Day of Podcasting: Rescheduled from KubeCon + CloudNativeCon EU

Join The New Stack on June 9 for a virtual day of conversations with technologists, developers and IT managers, sponsored by Dell Technologies, where we’ll discuss:
*Standing up a cloud native stack in light of tech debt and distributed infrastructure.
*Data protection with Kubernetes.
*A Scandanavian look at DevOps adoption and government interest in container technologies.
*Self-service storage: what developers need to know.
*Key considerations when adopting production-ready Kubernetes.

  • The eventbrite page for podcast events hosted by New Stack. Dell Technologies sponsors each session with members of group companies.

ICYMI: CNCF Webinars

Weekly recap of CNCF member and project webinars that you might have missed.

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: Securing Service Mesh with Kubernetes, Consul and Vault

Nicole Hubbard, Developer Advocate @HashiCorp and Justin Weissig, Technical Product Marketing Manager @HashiCorp

  • A webinar video that explains how to continue to grow your infrastructure, manage Secrets, and secure communications between apps running in multiple clusters.
  • The explanation of the calm demo, and the content that is fully filled with Q&A time.

CNCF Member Webinar: 20,000 Upgrades Later, Lessons From a Year of Managed Kubernetes Upgrades

Adam Wolfe Gordon, Senior Software Engineer @DigitalOcean

  • It provides tips and tricks to keep your workload running smoothly through this disruptive process.
  • I would like to receive advice and explanations of specific examples from the perspective of operators and developers. I want to look back before implementing the upgrade.

CNCF Member Webinar: Trivy Open Source Scanner for Container Images — Just Download and Run!

Teppei Fukuda, Open Source Engineer @Aqua Security

  • In the video, the position of the tool “Container Image Scanner” in the title has changed with the new version, and it has evolved (!!).
  • I learned a lot. The explanation was easy to understand and the diagram was easy to see. Although I am not familiar with it, the concept has come through. Especially on what remained was vulnerability diagnosis for each layer ID, vulnerability scanner of container image → artifact vulnerability scanner, OPA integration, etc. The appearance of Starboard was also good.
  • Online demos are tense, and I feel that there are many factors that do not work as well as offline. Perhaps the movement was heavy, when I quickly switched to the pre-recorded demo, I heard “Oh!!”

CNCF Member Webinar: Kubernetes: Zero to Hero Deployments and Management

Anthony Ramirez, Director of Consulting @Nebulaworks

  • The flow of Borg → Ω (Omega) → Kubernetes is explained in the context of orchestration.
  • The Terraform demo was good because the explanations and intentions of the commands were clear.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Google Vulnerability Rewards Program expands to include all the critical open-source dependencies of GKE

Eduardo Vela, Google

  • Google’s security blog announced that it would extend the bounty program to GKE’s critical dependency on OSS. Google Vulnerability Rewards Program (VRP).
  • A CTF (Capture-the-Flag) environment called kCTF is prepared in addition to the movement with the bounty program of Kubernetes of CNCF.Read and submit the Secret flag, which changes regularly due to vulnerability. You will get a bounty if you can take the two Secret flags prepared in GKE’s lab environment, an additional 100% of bugs are Google code, Google VRP reward, 100% of bugs of Kubernetes If it is a code, you will get a CNCF Kubernetes reward, but please check the original text for details.

Announcing k8s-image-availability-exporter to monitor missing images in Kubernetes

Andrey Klimentyev, Flant

  • Introducing a new OSS tool “ k8s-image-availability-exporter “ that proactively alerts images that are defined as objects in Kubernetes as Prometheus exporters but are not available in the container registry.

Service Mesh on Kubernetes with Istio and Spring Boot

Piotr Mińkowski

  • It demonstrates how to build applications using Istio and Spring Boot and provide communication between them over HTTP on Kubernetes.

Using SSL certificates from Let’s Encrypt in your Kubernetes Ingress via cert-manager

Oleg Saprykin, Flant

  • It describes the process of automating the issuance and renewal of certificates provided by Let’s Encrypt (and other services) for Kubernetes Ingress using the cert-manager add-on.

Dynamic DNS and LoadBalancing without cloud provider

Kevin Lefevre, Particle.io

  • Under the theme of “So how can you get the automation benefits of Cloud Native environment when running on bare metal or VMs”, this article explained how to implement dynamic DNS, load balancer without using the service of cloud provider.
  • Click here for the manifest used in this article.

Kubecost- cluster turndown

The most common use case is a nightly turndowns of a cluster running non-prod environments, typically dev, staging, or test.

  • An article introducing “Cluster Turndown,” which is an alpha function of Kubecost.
  • As its name suggests, the ability to scale down and up Kubernetes clusters based on custom schedules and set criteria. As of June 6, 2020, support is GKE, EKS, kops on AWS.

Introducing Aquayman

Christoph Mewes, Loodse

  • Introductory article of the new OSS tool “ Aquayman “.
  • Loodse uses Quay.io, and GitHub uses declaratively managing the Github organization “ Peribolos(a wall that encloses altars in Greek/Roman architecture), but the equivalent It was developed because it was not in Quay.io.
  • Aquayman is simply shortened and named “A Quay Manager”.
  • Account management for teams, members and robots in Organization by editing a single YAML file.

Agones 1.6

The latest release of Agones includes Player Tracking, Kubernetes 1.15, Node.js SDK Updates.

  • Release notes for version 1.6.0 of the open source project “Agones”, which is jointly developed by Ubisoft and Google to support building a game server on Kubernetes.
  • In addition to Player Tracking, which the editor of KubeWeekly has excerpted above, many changes and improvements have been made.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

CoreDNS, with John Belamaric

Craig Box and Adam Glick, Kubernetes Podcast from Google

  • Kubernetes Podcast by Google employees. The current co-hosts are Craig Box and Adam Glick.
  • Is the author of: “Configuring DNS for Cloud Native Environments Learning CoreDNS” Google’s Senior SWE, Kubernetes SIG Architecture of co-chair, CoreDNS project of Core Maintainer, O’Reilly this John Belamaric have welcomed Mr. as a guest.
  • Immediately after the launch, the podcast usually doesn’t cover the rest of the world, but also mentions “Death of George Floyd,” as two Co-hosts may be linked to the land where the incident took place.
  • The topics of interest in News of the week are: There are a lot of News this week, but many were checked.
    * AWS encrypts Fargate ephemeral disks in v1.4
    * PlanetScale open sources a Vitess operator
    * NIST deployment guidelines for proxy-based Service Mesh by Ramaswamy Chandramouli of NIST and Zack Butcher of Tetrate

LOTE #8: Michael Hausenblas on Customer Focus, Building App Platforms, and Kubernetes Access Control

Daniel Bryant, Datawire

  • A transcript of the “Livin’ on the Edge” podcast. This week’s guest is Michael Hausenblas (Product Developer Advocate@Container Service Team) from AWS. Engineers should seek to understand the essential parts of the business context, RBAC, OPA, etc.
  • Every time, the guests are very luxurious. Kelsey Hightower of Google Inc., which was covered here last week.

How To Jumpstart Your Hybrid And Multi-Cloud Strategy With Microservices And Kubernetes

Chandra Gundlapalli, Forbes

  • Forbes article. An article that describes how emerging Kubernetes technology can be leveraged to drive rapidly evolving hybrid and multi-cloud strategies. The impression that it is an article for management rather than an engineer.

Gartner’s 6 Best Practices for Containers, Kubernetes

Bill Doerrfeld, Container Journal

Mirantis’ Docker Enterprise 3.1 Adds Windows Support, Enterprise SLAs

Mike Melanson, The New Stack

  • An article from The New Stack that describes the release of Docker Enterprise 3.1 by Mirantis.
  • It is said that it is improving not only the function but also the support system such as Windows server support, GPU orchestration, 24 x 7 support, SLA setting etc. Webinar was also held to explain the update.

Lin Sun and Neeraj Poddar on Istio, Wasm, and the Future of Service Mesh

InfoQ Podcast

  • InfoQ podcast on the theme of “The evolution of service mesh data and control planes, new Istio 1.5 architecture, Istio WebAssembly extended support, and future of service mesh technology”.

CI/CD with Docker and Kubernetes (open source book)

Marko Anastasov

  • Free e-book repository published by Semaphore on GitHub.
  • If you enter your name/email address/affiliation here, an email PDF link will be sent to you.

How Migrate for Anthos streamlines legacy Java app modernization

Ady Degany and Tom Nikl, Google Cloud

  • An article that explains how GCP’s “Migrate for Anthos streamlines the modernization of legacy Java applications” with an example of running it with a sample application.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Self-service of Cloud Services for Kubernetes Application
Lewis Marshall, Cloud-Native delivery advocate @Appvia
June 9, 2020 10:00 AM Pacific Time

Member Webinar: Things to consider to operate a Multi-Tenant Kubernetes Cluster // Multi-Tenant Kubernetes Cluster를 운영하기 위해 고려할 사항
Han Sol Park // 박한솔, Senior Engineer @Samsung SDS
Kyle Choi // 최규황, Principle Engineer @Samsung SDS
This webinar will be delivered in Korean.
June 10, 2020 10:00 AM Korean Standard Time

Member Webinar: Develop your Cloud Native use cases at the Edge with K3s
Pranay Bakre, Staff Technical Marketing Engineer @Arm
Julio Suarez, Staff Engineer @Arm
June 10, 2020 7:00 AM Pacific Time

Member Webinar: Hybrid Cloud Kubernetes with Nodeless
Madhuri Yechuri, Founder @Elotl
June 10, 2020 10:00 AM Pacific Time

Community Webinar: Cluster API (CAPI) — A Kubernetes Subproject to simplify cluster lifecycle management
Katie Gamanji, Cloud Platform Engineer @American Express
Naadir Jeewa, Senior Member of Technical Staff @VMware
June 11, 2020 8:00 AM Pacific Time

Member Webinar: The Definitive Checklist for Delivering Reliable Kubernetes-based Applications
Brandon Groves, Senior Software Engineer @OverOps
Ben Morrise, Software Engineer @OverOps
June 11, 2020 10:00 AM Pacific Time

Community Webinar: What end users really recommend for Continuous Delivery
Cheryl Hung, Director of Ecosystem @CNCF
June 12, 2020 8:00 AM Pacific Time

Project Webinar: Charting Your Voyage To Helm 3
Matt Farina, Lead Engineer @Samsung SDS
Martin Hickey, Senior Software Engineer @IBM
Adam Reese, Senior Engineer @Microsoft
June 12, 2020 10:00 AM Pacific Time

Member Webinar: Multi-Cluster Service Mesh Operations and Extensibility with WebAssembly
Idit Levine, Founder and CEO @Solo.io
Christian Posta, Global Field CTO @Solo.io
June 16, 2020 10:00 AM Pacific Time

Member Webinar: Multitenancy Webinar: Better walls make better tenants
Adrian Ludwin, Senior Engineer @Google
June 17, 2020 8:00 AM Pacific Time

Member Webinar: Learning from the visible past to accelerate the observable future
Curtis Hrischuk, Technical Product Manager @Instana
June 17, 2020 10:00 AM Pacific Time

Member Webinar: How to Promote the use of Best Practices and Automate Security Policies Using Tools Like OPA and Kubernetes
Gary Duan, CTO and Co-Founder @NeuVector
June 18, 2020 10:00 AM Pacific Time

Member Webinar: Cloud Infrastructure for Network Functions — Requirements and testing
Dana Nehama, Director, Product Management Network Cloud @Intel Corporation
Petar Torre, Principal Engineer @Intel Corporation
June 24, 2020 7:00 AM Pacific Time

Member webinar: Kubernetes Cost Allocation Done Right
Webb Brown, Co-founder and CEO @Kubecost
June 24, 2020 10:00 AM Pacific Time

Member Webinar: Monitoring Kubernetes clusters by “chatting” with them Prasad Ghangal, Creator of BotKube and Software geek @InfraCloud
Vishal Biyani, CTO @InfraCloud
Hrishikesh Deodhar, Director of Engineering @InfraCloud
June 25, 2020 10:00 AM Pacific Time

Ambassador Webinar: Commoditise Kubernetes with cluster-api
Gianluca Arbezzano, Senior Staff Software Engineer @Packet
June 26, 2020 10:00 AM Pacific Time

Member Webinar: Best Practices for Running and Implementing Kubernetes
Kendall Miller, President @Fairwinds
Robert Brenna, Director of Open Source @Fairwinds*
June 30, 2020 10:00 AM Pacific Time

Member Webinar: 7 Critical Reasons for Kubernetes-Native Backup
Niraj Tolia, CEO and Co-Founder @Kasten
Mark Severson, Member of Technical Staff @Kasten
July 1, 2020 7:00 AM Pacific Time

Member Webinar: Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
July 1, 2020 1:00 PM Pacific Time

Member Webinar: Stay on top of ongoing Kubernetes security hygiene
Zohar Kaufman, Co-Founder and VP R&D @Portshift.io
Ariel Shuper, VP Product @Portshift.io
July 2, 2020 10:00 AM Pacific Time

Project Webinar: What’s new in Linkerd 2.8 : Multi-cluster Kubernetes made simple and secure by default
Oliver Gould, Linkerd Project Lead, co-founder & CTO @Buoyant
July 8, 2020 10:00 AM Pacific Time

Member Webinar: 如何落地 Service Mesh — 从技术选型到实践
马若飞 FreeWheel 北京研发中心首席工程师 @FreeWheel
This webinar will be delivered in Chinese.
July 9, 2020 10:00 AM China Standard Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

--

--

Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.