SRE / DevOps / Kubernetes Weekly Collection#27(Week 32)

Image for post
Image for post
  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #501 August 2nd, 2020
SRE Weekly Issue #230 August 2nd, 2020
KubeWeekly #228 August 7th, 2020

DEVOPS WEEKLY ISSUE #501 August 2nd, 2020

How do you get the benefits of microservice architectures at scale while minimising operational complexity? This post looks at applying domain driven design concepts and provides a useful case study.

  • The title is “Introducing Domain-Oriented Microservice Architecture”
  • I will skip it since it was covered in KubeWeekly last week

The adoption of devops practices has gone hand-in-hand with organisations embracing digital transformation. Both phrases risk overuse but these posts discuss some useful mental models to help focus conversations.

  1. Gaining Speed
  2. Using Digital Technology
  3. Interacting Digitally
  4. Becoming Customer-Centric
  5. Being Data-Driven
  6. Increasing Resilience
  7. Becoming Future Ready
  8. Building the Enterprise of the Future
  • What impressed me especially was that “speed improves security.” I tended to capture the trade-off between speed and security, so it was an opportunity to change my perspective.
    ○ Speed improves security because we can respond more quickly to changes in the threat landscape, and we can repeatedly, quickly, and with low overhead test our systems as they are being built and after they are deployed.

What properties of developer platforms lead to adoption? The following post is specifically about a large scale edge platform, but it’s a great read for anyone building platforms of all kinds for developers, including those doing so in internal platform teams.

  • The title is “The Edge Computing Opportunity: It’s Not What You Think”.

Starting from the announcement that a series of extensions will be announced in the next week called serverless week on the edge computing platform “Cloudflare Workers”’ announced nearly three years ago, two years after GA. This article was published on 2020/07/26.

  1. Matthew’s Hierarchy of Developers’ Needs
  2. Speed As the Killer Feature?
  3. Speed Alone Is Niche
  4. Consistency
  5. Zero Nanosecond Cold Starts
  6. Cost
  7. More Efficient Architecture Means Lower Costs
  8. From Limits to Limitless
  9. Ease of Use
  10. The Bezos Rule
  11. Compliance
  12. The Coming Era of Data Sovereignty
  13. Edge Computing to the Rescue
  14. Serverless Week

When first embracing devops practices and cloud services it’s common in large organisations to build a centre of excellence. There are some traps to avoid when taking this approach that the following post discusses.

  • The title is “Building a Cloud Centre of Excellence in 2020: 13 Pitfalls and Practical Steps”.
  • Personally, this is the theme that I was most interested in this week’s title. “As more and more organizations are looking to expand their cloud journey, many companies have set up a Cloud Center of Excellence (CCoE) to manage the initial embedding and launch of cloud adoption within their business. Start from the premise of “choose that.”
  • The following article mainly describes “6 Common Traps to AVOID When Establishing a CCoE” and “Practical Next Steps: Overcoming 7 Common CCoE Challenges”.
  • 6 Common Traps to AVOID When Establishing a CCoE
  1. Getting hung up on the term ‘centre’
  2. Controlling consumption of cloud services and blocking innovation opportunities
  3. Ignoring developer experience
  4. Applying traditional architecture standards and processes
  5. Confusing guardrails with blockers
  6. Catering for a single cloud maturity level
  • Practical Next Steps: Overcoming 7 Common CCoE Challenges
    Challenge #1: Building your CCoE — who should be part of it?
    Challenge #2: Getting started — How big should it be?
    Challenge #3: How much and what involvement should the business have in the CCoE?
    Challenge #4: Instilling Modern Ways of working
    Challenge #5: How should your CCoE evolve over time?
    Challenge #6: Should the CCoE be in charge of the cloud costs being incurred?
    Challenge #7: How can the CCoE successfully scale cloud adoption across the organisation?

The videos from the recent DevSecCon online conference are all available, with a range of interesting topics covered including infrastructure as code security, continuous audit compliance, supply chain attacks and more.

  • The title is “DevSecCon24–24hr virtual conference”.
  • Summary page of virtual conference “DevSecCon” held on June 15th in 3 time zones (APAC/EU/AMERICAS) .
  • Thank you for the playlists of the sessions being linked. I was also applying, but I could not see it at all.
  • It seems that the video itself is not published if it does not have a link. As of August 3, 2020, it seems that the following has not been published because there is no link.
    ○ APAC PANEL: The Future of DevSecOps — (Moderator) Mohammed A. Imran, Mohan Yelnadu, Jerome Walter, Stefan Streichsbier & Sarah Young
    ○ EU KEYNOTE: Design Thinking for Secure Development — Wolfgang Goerlich
    ○ The Container Security Checklist — Liz Rice
    ○ GitOps Progressive Security For Kubernetes — Gadi Naor
    ○ Threat Modeling the Death Star — Mario Areias
    ○ IGNITES: Unquantified Serendipity: Diversity in Development — Quintessence Anx

A nice discussion of tradeoffs between serverless architectures and monolithic applications, mainly focused on smaller scale apps.

  • The title is “My monolith doesn’t fit in your serverless”
  • The author started his words with “I have come to the conclusion that the problems I’m tasked to solve are tricky to get right using a serverless approach. Here’s my take on why not serverles-all-the-things.” and explained his thoughts.
  • He received some questions and answered on the bottom of this article.

A deep dive technical post on new features in the linux kernel which should make unprivileged containers more popular. Good introduction to the details of seccomp as well.

  • The title is “The Seccomp Notifier — New Frontiers in Unprivileged Container Development”.
  • I will skip it since it was covered in KubeWeekly last week.

Role-based access control plays an important role in securing Kubernetes. This handy site collects together articles and tools and the official documentation in one place.

  • web page. The description is as follows.
  • Advocacy site for Kubernetes RBAC
  • A site dedicated to good practices and tooling around Kubernetes RBAC. Both pull requests and issues are welcome.

An interesting survey for anyone using Serverless technologies to take, from a wide range of companies in the space. I’ll look forward to the results when they are published.

  • The title is “STATE OF SERVERLESS SURVEY 2020”.
  • The target is as follows. It seems to be done in about 7 to 8 minutes without a bearer.
  • This survey is for tech leaders, engineering managers and developers.
  • You will receive a free report and one of four Amazon eGift cards (each worth $20).

SRE Weekly Issue #230 August 2nd, 2020


LaunchDarkly’s Evolution from Polling to Streaming

LaunchDarkly started off with a polling-based architecture and ultimately migrated to pushing deltas out to clients.

Dawn Parzych — LaunchDarkly

  • It describes how to move from a polling architecture to a streaming architecture and how to address the “build vs. buy question”.
  • When they first introduced the streaming architecture, they chose “buy” and partnered with a third party, but as they continued to grow in size, issues began to arise and they built themselves by steering to “build”.

A simpler alternative to distributed tracing for troubleshooting

A brief overview of some problems with distributed tracing, along with a suggestion of another way involving AI.

Larry Lancaster — Zebrium

  • It proposes two issues of distributed tracing and a simple alternative.
    ○ Work required to yield results
    ○ Inadequacy of those results

Google Cloud Issue Summary Classroom — 2020–07–07

This is Google’s post-incident report for their Google Classroom incident on July 7.

  • Follow-up article on Classroom outage at Google Cloud. Some users (20% at peak) using iOS and Android apps could not access Classroom.

Introducing Domain-Oriented Microservice Architecture

Uber has long been a champion of microservices. Now, with several years of experience, they share the lessons they’ve learned and how they deal with some of the pitfalls.

Adam Gluck — Uber

  • It’s appearing again, but I will skip it too, since it was covered in KubeWeekly last week.
  • It’s featured in all three e-mail magazines I am checking out on this blog, so you can see how much attention this article has.

Keeping PagerDuty Always On With Remote Incident Response

This article opens with an interesting description of what the Cloudflare outage looked like from PagerDuty’s perspective.

Dave Bresci — PagerDuty

  • The author cited the large-scale failure (Is it Cloudflare one?) due to the router misconfiguration that occurred most recently, and introduced PageDuty’s response when a large-scale failure occurred. Introducing the company’s documentation centered on Slack integration.

Safe by design?

This post reflects on two distinct philosophies of safety:

the engineering design should ensure that the system is safe design alone cannot ensure that the system is safe

Lorin Hochstein

  • The author discusses different perspectives of Nancy Leveson and many others in the resilience engineering community in an interesting way to clarify his thoughts on the safety of Nancy Leveson’s system at the MIT STAMP workshop. Both are important points for me and I will keep these.
    ○ Leveson believes that depending on human adaptation in the system is itself dangerous. If we’re depending on human adaptation to achieve system safety, then the design engineers have not done their jobs properly in controlling hazards.
    ○ The resilience engineering folks believe that depending on human adaptation is inevitable, because of the messy nature of complex systems.

All we can do is find problems

You can’t use availability metrics to inform you about whether your system is reliable enough, because they can only tell you if you have a problem.

Lorin Hochstein

  • The same author as above also participated in Nancy Leveson’s next session, “Safety Assurance (Safety Case): Is it Possible? Feasible?” to summarize and comment on the session at the MIT STAMP workshop..
  • Leveson is skeptical about assessing the safety of the system. Instead, they argue that safety can be designed by focusing on generating safety requirements at the design stage rather than performing post-design evaluation.
  • I agree with her about some points of availability. However, there are some agonizing points if we find the idea at the operation phase has already started. Perhaps I don’t understand some key ideas of the author and Leverson. Is it okay to have an understanding of that without covering it in operation, it will return to the design and restart from that phase?
  • Three closing words of her slide.
  1. If you are using hazard analysis to prove your system is safe, then you are using it wrong and your goal is futile
  2. Hazard analysis (using any method) can only help you find problems, it cannot prove that no problems exist
  3. The general problem is in setting the right psychological goal. It should not be “confirmation,” but exploration.

KubeWeekly #228 August 7th

Editor’s pick of the highlights from the past week.

4 Tips for Maximizing Your Virtual KubeCon Experience

Amanda Katona, VMware

With hundreds of hours of programmed content for KubeCon + CloudNativeCon EU 2020 Virtual, it risks becoming an overwhelming (and occasionally numbing) experience. Don’t worry — the CNCF community is here to help! As you start to prepare for the event, Amanda shares her advice for making the most of the virtual experience. Read the blog here.

  • The title is 4 Tips, but it actually conveys the following 6 tips. The last tip is to eat in the style of the venue, Amsterdam, so I wanted to enjoy the atmosphere at home.
  1. Build your agenda
  2. Prepare questions for your top 5 sessions
  3. Take incredible notes
  4. Get your swag on
  5. Be extra social and extra positive
  6. Eat french fries with mayonnaise

KubeCon + CloudNativeCon EU Virtual Session Spotlight

The countdown to KubeCon + CloudNativeCon EU Virtual on August 17–20, 2020 is on! As we approach the event, we curated a few recommended sessions that we don’t want you to miss. Please see the feature for this week and be sure to register today!

Keynote: How to Love K8s and Not Wreck the Planet

Presented by Holly Cummins, Worldwide IBM Garage Developer Lead, IBM

The past five years have been the warmest since records began. Human activity, including the IT industry, is driving worrying about climate change. Data centers alone consume 3% of the world’s energy, and more and more of that energy is being used by Kubernetes and workloads running on Kubernetes. Is k8s helping, or making things worse?

The beauty of the cloud is that it makes it easy to run code, virtualized, and scheduled for efficiency… but it doesn’t provide any guarantee that what’s running is useful. Even when the workload is high-value and efficient, Kube sprawl can lead to low utilization, unsatisfactory elasticity, and high costs — but mega-mono-clusters have their own problems around isolation, security, and management. How should these competing requirements be balanced? This talk discusses some of the trade-offs and provides a roadmap to figuring out the right thing.

Register now!

  • KubeCon + CloudNativeCon EU Virtual 8/19 Keynote. Schedule: Wednesday, August 19th 15:58–16:14 CEST (Central European Summer Time).
  • It seems that the session will be conducted with the following thema.
    ○ “The beauty of the cloud is that it makes it easy to run code, virtualized and scheduled for efficiency… but it doesn’t provide any guarantee that what’s running is useful. “
    ○ “Even when the workload is high-value and efficient, Kube sprawl can lead to low utilisation, unsatisfactory elasticity, and high costs — but mega-mono-clusters have their own problems around isolation, security, and management.”
    ○ “How should these competing requirements be balanced? “
    ○ “This talk discusses some of the trade-offs and provides a roadmap to figuring out the right thing.”

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Project Webinar: How We Doubled System Read Throughput with Only 26 Lines of Code

Minghua Tang, Infrastructure Engineer @PingCAP

  • It describes the TiKV architecture, the reason for introducing “Follower Read”, and how to implement it.
  • TiKV is a strongly consistent key-value database built on the Raft algorithm.

CNCF Member Webinar: Comparing eBPF and Istio/Envoy for Monitoring Microservice Interactions

Roko Kruze, Solutions Engineer @Flowmill and Jonathan Perry, CEO @Flowmill

  • It uses service meshes such as Istio / Envoy and eBPF to monitor, compare and contrast traffic between microservices.
  • It also discusses the types of visibility that each approach can provide, compares their performance implications, and describes how to deploy them in complementary ways.

CNCF Member Webinar: Making Data Work for Developers with Kubernetes & Cassandra

Chris Splinter, Sr. Product Manager — Developer Solutions @DataStax and Patrick McFadin, VP of Developer Relations @DataStax

  • The following describes how Kubernetes and Apache Cassandra work together to solve the two issues.
  1. Modern application stacks require that the data serving infrastructure be as flexible as all other layers with minimal tradeoffs.
  2. Companies need to quickly build and deliver their next app
  • The topics are as follows.
  1. Data considerations when modernizing your stack with Kubernetes and microservices
  2. Examples of best practices that users are deploying to deal with these complexities
  3. Our experiences of building and using a Kubernetes Operator for Cassandra at scale

CNCF Member Webinar: Debugging your debugging tools; What to do when your service mesh goes down in production?

Neeraj Poddar, Co-founder and Chief Architect @Aspen Mesh and John Howard, Software Engineer @Google

  • Debugging in production with Istio focuses on the following topics:
    ○ How to debug and diagnose issues with your sidecar proxy Envoy
    ○ How to monitor and debug the Istio control plane
    ○ How to use operational tools like “istioctl” to understand issues with your configuration
    ○ Using profiling to identify bottlenecks
    ○ Recommendations for a production ready secure Istio deployment

CNCF Member Webinar: Maximizing M3 — Pushing performance boundaries in a distributed metrics engine at global scale

Ryan Allen, Senior Software Engineer @Chronosphere

  • It introduces “M3”, an open source distributed metric engine, and details some of the performance optimizations.
  • It focuses on how contributors worked to identify bottlenecks, investigate potential solutions, benchmark results, and test defects.

Tutorials, tools, and more that take you on a deep dive into the code.

Providing Persistent Storage to Windows Containers

Alina Ryan, Red Hat and Mohammad Saif Shaikh, Red Hat

  • It describes how stateful applications can run on the various cloud platforms supported by OpenShift.
  • It has dynamically provisioned storage nodes and are using persistent volumes across mixed node (Windows and Linux) clusters. I want to do hands-on, so I will keep it.

Why I use Ingress Controllers to expose Kubernetes services

Kevin Crawley, Containous

  • Before explaining Ingress Controller, it said that “Out of the complexities that developers of cloud-native applications face, strategically utilizing Kubernetes ingress controllers is among the most difficult components to understand — and among the most important.” and started to explain from “Why is the network important for the development workflow?”.

Sandboxing and Workload Isolation

  • It explains sandboxing and workload isolation. I will read this again later.
  • He said that “It seems to me like, for new designs, the basic menu of mainstream options today is:” and described the following options.
    ○ Jailing otherwise-unmanaged Unix programs with nsjail or something like it.
    ○ Running unprivileged Docker containers, perhaps with a tighter seccomp profile than the default.
    ○Going full gVisor.
    ○ Running Firecracker, either directly or, in a K8s environment, with something like Kata.

How To Set Up and Secure an etcd Cluster with Ansible on Ubuntu 18.04

Digital Ocean

  • The first half describes how to set up a 3-node etcd cluster on an Ubuntu 18.04 server. The second half focuses on securing the cluster using TLS.
  • In addition to the components in the title, you will learn the following tools as well.
  • Extensive commentary, CLI and options. There are 14 steps, and it feels quite large.

Kubernetes in Production with Jessica Deen at swampUP 2020

Jagdish Mirani and Adi Atzmony, JFrog

  • A summary of the sessions at swampUP 2020 . The video is embedded.
  • A session for those who want to implement Kubernetes to the next level, enterprise grade.

How we migrated Dropbox from Nginx to Envoy

Alexey Ivanov and Oleg Guba, Dropbox

  • It describes the old Nginx-based traffic infrastructure, its problems, and the benefits of migrating to Envoy.

As I read it, I found other articles such as Bandaid and other interesting items. I will read this again.

The Easiest And Fastest Way To Deploy An OKD 4.5 Cluster In A Libvirt/KVM Host

Carlos Camacho, Red Hat

  • It uses KubeInit to deploy an OKD 4.5 cluster in about 30 minutes with a single command.
  • He deploys 3 controllers, 1–10 workers, 1 service, and 1 bootstrap node.


Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster.

  • A GitHub page of OSS debug tool Goldpinger for Kubernetes.
  • It generates Prometheus metrics that can be scraped, visualized and alerted as a Daemonset on Kubernetes.
  • Kubernetes用のOSSデバックツールGoldpingerのGitHubページ。

Public Sector on Air: Managed Openshift on AWS w/ Tres Vance

Tres Vance and Erik Jacobs, Red Hat

Deconstructing Kubernetes Networking

Emanuel Evans

  • The first article in the series. It has a very basic, somewhat functional Kubernetes cluster set up on one node.
  • In the next post, it’ll be setting up a multi-node cluster to get it up and running.

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Craig Box and Adam Glick, Kubernetes Podcast from Google

LOTE #14: Katie Gamanji on Kubernetes Tooling DX, GitOps, and the Cluster API

Ambassador Podcast

  • A transcript of the “Livin’ on the Edge” podcast. This week’s guest is Katie Gamanji (Cloud Platform Engineer at American Express and TOC member of the CNCF).
  • She explained the developer experience components associated with interacting with a Kubernetes cluster.
  • They discuss UI driven tools such as kubectl to k9s and Octant, and the evolution of tools from ApplicationOps to GitOps.
  • I think the content with this podcast guest is great every time.

The community hat rules the company hat in open source

Matt Asay, AWS

  • It focuses on Lili Cosic, maintainer of “kube-state-metrics” and Software Engineer of Red Hat , and how she wears hats between the company and OSS.

3 reasons to use an enterprise Kubernetes platform

Ernest Jones, The Enterprisers Project

  • It explained the answer of “Why choose an enterprise Kubernetes platform, as opposed to assembling open source Kubernetes tools yourself?” with the following three points.
  1. Portability
  2. Time savings / Time to value
  3. Stability/Security

The Ups and Downs of Box’s Kubernetes Journey

Alex Williams and B. Cameron Gain, The New Stack

  • A podcast summary article from The New Stack. Mr. Kunal Parmar (Director of Engineering) of Box Company is a guest. The Podcast is embedded.
  • He describes the long and winding road of Kubernetes journey, one of the first companies to introduce Kubernetes, as a case study.

A Guide to Untangling the CNCF Cross-Community Relationships

Diane Mueller, Red Hat

  • A CNCF article that explains how to analyze the relationship that spans the CNCF community.
  • They are focusing on the participants who are the “connectors” between the communities. I became interested in approaches to digitalization and visualization. It links to information that you may be interested in.

Upcoming CNCF webinars

You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.

Member Webinar: Hardware for Kubernetes, Peeling Back the Layers
Erik Reidel, SVP Compute & Storage Solutions @ITRenew
Aug 11, 2020 10:00 AM Pacific Time

Member Webinar: The Open-Source Observability Playbook
Hen Peretz, Head of Solutions Engineering @Epsagon
Aug 12, 2020 7:00 AM Pacific Time

Member Webinar: Migrating Real-Time Communication Applications to Kubernetes at Scale: Learnings from 8×8’s Experience
Michael Laws, Sr. Site Reliability Engineer/DevOps at 8×8
Pankaj Gupta, Sr. Director at Citrix
Aug 12, 2020 1:00 PM Pacific Time

Ambassador Webinar: Navigating the service mesh ecosystem
Lachie Evenson, Principal Program Manager @Azure & CNCF Ambassador
Aug 14, 2020 10:00 AM Pacific Time

Member Webinar: MLOps automation with Git Based CI/CD for ML
Yaron Haviv, Co-Founder and CTO, Iguazio
Aug 26, 2020 1:00 PM Pacific Time

Project Webinar: Kubernetes 1.19
Kubernetes release team
Aug 28, 2020 10:00 AM Pacific Time

Member Webinar: Getting started with container runtime security using Falco
Loris Degioanni, CTO and Founder @Sysdig
Sept 2, 2020 1:00 PM Pacific Time

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store