SRE / DevOps / Kubernetes Weekly Collection#21(Week 26)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #495 June 21st, 2020
SRE Weekly Issue #224 June 21st, 2020
KubeWeekly #222 June 26th, 2020

DEVOPS WEEKLY ISSUE #495 June 21st, 2020

A fantastically detailed post on Amazon’s adoption of continuous delivery. Lots of information about automated testing of successful deployments and scaling pipeline configuration.

  • The title is “Automating safe, hands-off deployments”.
  • An article on the AWS web page. Article of “LEVEL 300” of “SOFTWARE DELIVERY AND OPERATIONS” in “The Amazon Builders’ Library”. I didn’t know this library itself, so I bookmarked it immediately.

A blog post series around bringing cloud-like automation to on-premise hardware environments, looking at the new Tinkerbell project in particular

Adopting devops practices often means new roles and team structures. This post covers some of those roles and tips for them being effective in an existing organisation.

  • The title is “Building Highly Effective DevOps Teams: Structure, Roles & Responsibilities You Need to Succeed”.
  • It is interesting to explain the five main roles that should be included in the DevOps team structure, how to make an extremely efficient team, etc.

A good list of considerations for those adopting serverless platforms. Logging, monitoring, self-service, elastic scaling and more.

  • The title is “Top DevOps Considerations For Serverless”.
  • An article that explains “What DevOps developers who use FaaS should consider” from the following points.
    ○ Self Service
    ○ Memory Size
    ○ Elastic Scaling
    ○ Execution Time
    ○ Cold Start
    ○ Operational Overhead
    ○ Integrated logging and monitoring

A look at standing up a full application and infrastructure stack as part of a pull request workflow. Discusses automation and different testing approaches.

  • The title is “Turbocharge your team’s development workflow with this strategy that provides quick feedback in a collaborative, no-risk environment”.
  • An article on the website of SingleStone that explains the benefit that the company set out to build its first SaaS services, “every branch of code we developed would get built and deployed — in its own environment — before it was merged”, and the method.

A recent podcast recording and notes of a conversation focused on developer productivity, the importance of platforms and building a culture of continuous improvement.

  • The title is “LOTE #9: Gene Kim on Developer Productivity, the “Five Ideals”, and Platforms”.
  • A transcript of the “Livin’ on the Edge” podcast. This week’s guest is Gene Kim, co-author of “The Phoenix Project” and author of “The Unicorn Project”.
  • They talked about “The importance of developer productivity within the larger context of DevOps, and explores the “five ideals””, ”How the platform that engineers deploy onto should codify operational best practices that promote flow when testing, deploying, and releasing functionality.”and so one.

Notes on the latest Salt release. Includes details for anyone contributing to Salt (like changes to the test runner) and lots of user benefits, including performance improvements and lots of new features.

  • The title is “What’s New in Salt 3001 Sodium”.
  • Unofficial new feature summary article for “Salt 3001 Sodium”. Due to time constraints, he couldn’t touch the details more than he originally planned, so he said that ”If you want to read about other changes and deprecations, then go read the official release notes and the changelog.”.

A look at adopting automated code review and testing tools for Terraform. From “works on my machine” to Atlantis and Conftest and GitHub integration.

  • The title is “Terraform Code Reviews: Supercharged with Conftest”.
  • It is a slide uploaded to SlideShare.
  • In the matrix of “Agility vs Stability”, he explained the 3 steps of typical evolution of Terraform code review, and introduced the method that uses Conftest as a method that has both Agility and Stability.

King is looking for new members for the infrastructure engineering teams to help develop, manage and expand our software based networking setup across datacenters and (Google) cloud. Please take a look at the open role for networking engineers. We’re also still looking for both database and streaming data engineers, if that is more your style.

  • Continued job information from King. There seems to be no fluctuation in the post. It seems that we are looking for SRE , Database SRE , Network SRE(at that moment).

Resgate is a realtime API gateway which uses NATS under the hood to make building REST, real time, or RPC API easier where all your clients are synchronized seamlessly.

Reviewdog is an automated code review tool that integrates with lots of different code analysis tools as well as most popular CI systems and source control systems.

  • The GitHub page of the OSS tool “Reviewdog” that automatically posts review comments to GitHub etc.

Taskcat is a tool for testing AWS CloudFormation templates. It stands them up the stack across multiple AWS regions and runs tests to check everything is working as expected.

  • The io page for the OSS tool “taskcat” that tests AWS CloudFormation templates.
  • Deploy the AWS CloudFormation template to multiple AWS Regions and generate a report with pass/fail assessments for each region.
  • Click here for the GitHub page.

SRE Weekly Issue #224 June 21st, 2020

How diversity, inclusion, and belonging looks in the tech industry

Diversity and inclusion make our companies stronger and more effective. This article has lots of links with evidence of why diversity matters and how to get your company on the road to improvement.

Sara Kassabian — GitLab

  • Part 2 article of the trilogy on the theme of “Diversity, Inclusion, and Belonging” on the GitLab Web page.
  • Explaining the business value of diversity and the underestimation of unfairness of minorities from technology, strategies, and the situation of other companies.

Is Your Team Culture Ready to Accelerate Innovation and Build Resiliency with Chaos Engineering?

Starting on the road to chaos engineering is about more than just figuring out what experiments to run. Spreading knowledge and gaining buy-in before you start is critical.

Deven Samant — Business 2 Community

  • Introductory article on how to introduce chaos engineering into an organization. It said that “Chaos engineering is much more than a set of tools and rules. It involves adopting a culture in which teams trust each other and collaborate to build resiliency, advance innovation, and launch products and services. “ and explained how to implement measures as part of your chaos engineering journey.

What happens when you update your DNS?

DNS propagation and inconsistent resolver behavior has bitten me so many times in my career.

Julia Evans

  • An article explaining what is happening on the other side when updating DNS records.

Post-Incident Reviews With Jaime Woo & Emil Stolarsky

I don’t often have enough time to listen to podcasts, but when it’s these two, I had to. Jaime and Emil talk about post-incident reviews, geeking out about incidents, and their philosophy on publishing a zine. Scott McAllister — Page It To the Limit Podcast (PagerDuty)

  • The podcast “Page It to the Limit” talks about “postmortem”.

Heroku Incident #2042 Follow-up

As so often happens, their attempts to fix a problem caused other problems. Has that happened to you? I’d love to read your story about it!

  • At Heroku 2020/06/09 21:20 UTC~2020/06/ 10 Follow up information for the failure that occurred at 07:17 UTC.

Being Kind

This article opens with a great story about how to help someone feel better when they are a contributing factor in an outage.

Tanya Reilly

  • An article that suggests that the author himself is kind and touches on the events that occurred when he first failed.

KubeWeekly #222 June 26th, 2020

Editor’s pick of the highlights from the past week.

Congratulations, Harbor on CNCF Graduation!

This week, CNCF announced that the Harbor project has reached graduated status. The project is the eleventh to graduate. To move from the maturity level of incubation to graduation Harbor has demonstrated growing adoption, an open governance process, feature maturity, and a strong commitment to community, sustainability, and inclusivity.

Harbor is an open-source registry that secures artifacts with policies and role-based access control, ensures images are scanned and free from vulnerabilities, and signs images as trusted. We encourage you to learn more about the project and this exciting milestone here.

  • Release article and CONTAINER JOURNAL article that conveys that “Harbor” has reached the 11th “Graduation” as a CNCF project.

SPIFFE/SPIRE move to CNCF Incubation-level hosted projects

In other project news this week, SPIFFE/SPIRE is now an Incubation-level hosted project. The SPIFFE (Secure Production Identity Framework For Everyone) specification defines a standard to authenticate software services in cloud native environments through the use of platform-agnostic, cryptographic identities. SPIRE (the SPIFFE Runtime Environment) is the code that implements the SPIFFE specification on a wide variety of platforms and enforces multi-factor attestation for the issuance of identities. In practice, this reduces the reliance on hard-coded secrets when authenticating application services.

Joining CNCF incubation-level projects like OpenTracing, gRPC, CNI, Notary, NATS, Linkerd, Rook, etcd, OPA, CRI-O, TiKV, CloudEvents, Falco, Argo, and Dragonfly, SPIFFE and SPIRE are part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provides governance, marketing support, and community outreach.

To learn more about SPIFFE/SPIRE, visit

  • The release article of CNCF and the article of The New Stack which tells that “SPIFFE” and “SPIRE” were recognized as “Incubation” as a project of CNCF.
  • Bloomberg, Bytedance, Pinterest, Square, Uber, and Yahoo Japan are listed as the companies that adopted them.

Weekly recap of CNCF member and project webinars that you might have missed.

You can view all CNCF recorded and upcoming webinars here.

CNCF Member Webinar: Fast packet processing with KubeVirt

David Vossel, Principal Software Engineer and @RedHat Petr Horacek, Senior Software Engineer @Red Hat

  • HCO (Hyperconverged Operator) architecture, SR- IOV (Single Root I/O Virtualization) Webinar video that explains how to configure a VM to use network devices.
  • If I dig deep into VM and KubeVirt, it seems that my poor understanding of parts of infrastructure can be exposed, so I’ll keep it as my homework.

CNCF Member Webinar: Kubernetes Cost Allocation Done Right

Webb Brown, Co-founder and CEO @Kubecost and Ajay Tripathy, CTO @Kubecost

  • It explains why Kubernetes cost allocation is difficult and “Kubecost” as a solution.
  • Both speakers have been working at Google for a long time as a cloud product infrastructure engineer and product manager for monitoring the performance of Borg, Firebase and other internal systems and have many years of experience on this subject.

CNCF Member Webinar: Cloud Infrastructure for Network Functions — Requirements and testing

Dana Nehama, Director, Product Management Network Cloud @Intel Corporation and Petar Torre, Principal Engineer @Intel Corporation

  • It demonstrates how to build a Kubernetes cluster, using methodologies, and tools to characterize performance.

CNCF Member Webinar: Introduction to Cloud Provider Sub Sig BaiduCloud // 介绍SIG Cloud Provider子项目BaiduCloud

Ti Zhou 周倜, Senior Architect 高级架构师 @Baidu 百度 Zichao Ye 叶子超, Senior Software Engineer 高级软件工程师 @Baidu and 百度 Tianyuan Sun 孙天元, Senior Software Engineer 高级软件工程师 @Baidu 百度

  • This webinar was delivered in Chinese.
  • Titled “SIG Cloud Provider Baidu Cloud,” the project, which made it into the top 10 contributions to Kubernetes in the Kubernetes ecosystem in 2019, shared experiences/lessons and their future roadmap around Kubernetes.

CNCF Member Webinar: Monitoring Kubernetes clusters by “chatting” with them

Prasad Ghangal, Creator of BotKube and Software geek @InfraCloud Vishal Biyani, CTO @InfraCloud and Hrishikesh Deodhar, Director of Engineering @InfraCloud

  • It describes the demonstration of a variety of use cases and how to enable the team and operations work with.

Tutorials, tools, and more that take you on a deep dive into the code.

Exploiting an Envoy heap vulnerability

Harvey Tuch, Google

  • At Google, they have a commitment to enhancing the security and reliability of the Envoy proxy. It explains past vulnerabilities and countermeasures while giving examples that target data plane operation, memory allocation, etc.

Deploying Istio with restricted Pod Security Policies

Laszlo Bence Nagy, Banzai Cloud

  • It explains how to use the restricted PSP (Pod Security Policies) and the open source Banzai Cloud Istio operator to run the control plane components of Istio with as few privileges as possible.

Cross-Cluster Traffic Mirroring with Istio

Mert Acikportali, Trivago

Manage your Kubernetes cluster with Lens

Chris Collins, Red Hat

  • An article that describes the OSS tool Lens , which is a convenient UI for operating Kubernetes clusters, bills itself as “the Kubernetes IDE”.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Kubermatic, with Sebastian Scheele

Adam Glick and Craig Box, Kubernetes Podcast from Google

Bayer Crop Science seeds the future with 15000-node GKE clusters

Rob Long and Maciek Różacki, Google Cloud

  • A user case of Bayer Crop Science (BCS), which has utilized the overwhelming scale of 15,000 nodes on GKE of Google Cloud.
  • The result of this collaboration will make this capacity available to all GKE users this year (!?).

Tsunami: An extensible network scanning engine for detecting high severity vulnerabilities with high confidence

Guoli Ma, Claudio Criscione and Sebastian Lekies, Google Open Source

  • Release article of the new OSS network scan engine “Tsunami” by Google.
  • I found it difficult to name products and frankly state opinions and raise questions. I respect the decisions and actions of the people who made the pieces ahead of time.(I just expressed my feeling about the naming matter of this.)

Kubernetes: 4 ways to save IT budget with automation

Kevin Casey, Red Hat

  • Again this week, an easy-to-understand article by Kevin Casey of Red Hat, explaining a number of points.

Architecting Kubernetes clusters — choosing a cluster size

Daniel Weibel, ITNEXT

  • It describes the pros and cons of having many small clusters or a few large clusters to run a particular set of apps.

Service Mesh Comparison: Istio vs Linkerd


  • An article aimed at learning about the architecture of Istio and Linkerd, the products of the main service meshes, their moving parts, comparing them and supporting informed decision making.

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Ambassador Webinar: Commoditise Kubernetes with cluster-api
Gianluca Arbezzano, Senior Staff Software Engineer @Packet
June 26, 2020 10:00 AM Pacific Time

Member Webinar: Best Practices for Running and Implementing Kubernetes
Kendall Miller, President @Fairwinds
Robert Brenna, Director of Open Source @Fairwinds*
June 30, 2020 10:00 AM Pacific Time**

Member Webinar: 7 Critical Reasons for Kubernetes-Native Backup
Niraj Tolia, CEO and Co-Founder @Kasten
Mark Severson, Member of Technical Staff @Kasten
July 1, 2020 7:00 AM Pacific Time

Member Webinar: Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
July 1, 2020 1:00 PM Pacific Time

Member Webinar: Stay on top of ongoing Kubernetes security hygiene
Zohar Kaufman, Co-Founder and VP R&D
Ariel Shuper, VP Product
July 2, 2020 10:00 AM Pacific Time

Member Webinar: Optimize your Kubernetes Clusters on Azure with Built-in Best Practices
Jorge Palma, Senior Program Manager @Microsoft
July 7, 2020 10:00 AM Pacific Time

Member Webinar: The Challenges and Countermeasures of Service Mesh Practice
裴斐 (Fei Pei), 网易 杭州研究院 云计算技术专家、架构师 @网易*
This webinar will be delivered in Chinese.
July 8, 2020 10:00 AM China Standard Time**

Project Webinar: What’s new in Linkerd 2.8 : Multi-cluster Kubernetes made simple and secure by default
Oliver Gould, Linkerd Project Lead, co-founder & CTO @Buoyant
July 8, 2020 10:00 AM Pacific Time

Member Webinar: Building Production-ready Services with Kubernetes and Serverless Architectures
Mike Metral, Software Architect and Engineer @Pulumi
Jason (Jay) Smith, App Modernization Specialist @Google Cloud
July 8, 2020 1:00 PM Pacific Time

Member Webinar: 如何落地 Service Mesh — 从技术选型到实践
马若飞 FreeWheel 北京研发中心首席工程师 @FreeWheel
This webinar will be delivered in Chinese.
July 9, 2020 10:00 AM China Standard Time

Member Webinar: The top 10 most-useful Kubernetes APIs for comprehensive cloud-native observability
Caleb Hailey, Co-founder and CEO @Sensu
July 9, 2020 10:00 AM Pacific Time

Member Webinar: Securing and Accelerating the Kubernetes CNI Data Plane with Project Antrea and NVIDIA Mellanox ConnectX SmartNICs
Antonin Bas, Maintainer of Project Antrea and Staff Engineer @VMware**
Moshe Levi, Sr. Staff Engineer @NVIDIA*
July 14, 2020 10:00 AM Pacific Time

Member Webinar: Serving Millions of Customers with Cloud Native and DevSecOps
Chris Hollies, CTO, Oracle Practice @Capgemini
Akshai Parthasarathy, Principal Director, Cloud Native and DevOps @Oracle Cloud
July 15, 2020 7:00 AM Pacific Time

Member Webinar: Advancing image security and compliance through Container Image Encryption!
Brandon Lum, Senior Software Engineer @IBM
July 15, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes and storage. Kubernetes for storage. An overview.
Kiran Mova, Chief Architect at MayaData and core maintainer of OpenEBS @MayaData
July 16, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes Security Anatomy and the Recently Disclosed CVEs
Gadi Naor, CTO & Co-Founder @Alcide
July 21, 2020 10:00 AM Pacific Time

Member Webinar: Implementing Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Oleg Chunikhin, CTO @Kublr
July 22, 2020 1:00 PM Pacific Time

Member Webinar: Observability of multi-party computation with OpenTelemetry
Antoine Toulme, Engineering Manager @Splunk
Dave McAllister, Sr. Technical Evangelist @Splunk
July 23, 2020 10:00 AM Pacific Time

Member Webinar: Kubernetes Policies 101
Eran Leib, Founder, VP Product Management @Apolicy
Spenser Paul, Director of Sales, North America @DoiT International
July 28, 2020 10:00 AM Pacific Time

Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code
TiKV team
July 31, 2020 10:00 AM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store