SRE / DevOps / Kubernetes Weekly Collection#1(Week 06)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #475 February 2nd, 2020
SRE Weekly Issue #205 February 2nd, 2020
KubeWeekly #202 February 7th, 2020

Disclaimers:

  • If you have some questions or comments, please feel free to contact me.
  • I really appreciate it if you check the original resources from the linked URL too.
  • I guess that I have some misunderstandings or blurry phrases, I insist on leaving me some comments even if I am not a professional or am not titled for the content.
  • Since there is a lot of information, I pick only words and links(not images).
  • Some resources include information from before 2019, authors do not only pick brand new ones.

DEVOPS WEEKLY ISSUE #475–2nd February 2020

News

For lots of people in large organisations, the CIO is the top role on the IT side. But the expectations, and the environment, are changing. This post explores the changing role of the CIO in response.

  • Explained what the modern business requires for the CIOs as a new kind of tech leader based on 3 overall vectors of transformation and 5 features of CIO who make innovation.

A good talk on Kubernetes security, dipping into several tools but mainly showing the nature of a few specific exploits and giving tips to avoid common problems,

  • Explained the secure(or insecure) containerization with many memorable images.

The start of a series of posts implementing a container manager/runtime, in order to better understand the relationship between high-level tools like Kubernetes and the containers that run on the operating system.

  • Started a series of the implementation of Container manager(The author called a higher-level component controlling multiple OCI runtime instances), and explained through conman.
  • This article was written on October 6, 2019 and you can check the following articles on the website.

A collection of 30 of the best technical talks from last year. Topics range from building reliable services to low-level linux features and global traffic management to testing in production.

  • The author/editor personally selected the 30 best technical talks from last year.
  • It links each presentation video and you can check through casually.
  • Each presentation is very high quality and I’m fascinated even by genres which I’m not so sure. It works well as reference.

A look at some of the patterns of container evolution, in particular exploring the rise of microVMs, unikernels and container sandboxes.

  • It talks about the current container barriers and future. It introduces MicroVMs, Unikernels, and Container sandboxes.

A nice walkthrough of installing Cloud Foundry on Kubernetes, looking at the different tools available to do the job.

  • It introduces cf-operator, Cloud Foundry Quarks, Eirini, and kubecf as tools for running Cloud Foundry on Kubernetes. He performed an imperfect demo because it has a lot of things to be desired when using in production.

Jobs

Hiring DevOps engineers in Berlin & remotely, Arweave is a permanent information storage network, built on a new type of blockchain called a blockweave. They’re working to solve the problem of the ‘memory hole’, as formulated by George Orwell in Nineteen Eighty-Four, by building the first data storage medium that truly never forgets. Arweave works by rewarding network participants for contributing hard disk space to the network, in a similar fashion to proof of work in Bitcoin and other cryptocurrencies. Contact jobs@arweave.org or apply at:

  • The page of hiring information on Stack Overflow at that moment. Arweave wanted a DevOps professional in Berlin.

Tools

Rode looks like an interesting software supply chain tool. It provides for the collection, attestation and enforcement of policies, supported by Grafeas and Open Policy Agent.

  • Introduction of Rode. It collects policies supported by OPA(Open Policy Agent) and Grafeas and conducts authentication and adaptation of policies like a software supply chain tool. The README.md chart is easy to view.

Service resilience requires both real-time incident response software and a robust incident management and IT ticketing tool. These common techniques and tools can help you enhance your VictorOps and ServiceNow integration — making incident management suck less:

  • Introduction of the service integration of VictorOps and ServiceNow. I received the impression of “collaboration” rather than said “integration”. It might be easy to understand Splunk’s group product, VictorOps and SNOW(ServiceNOW) cooperation introduction.

SRE Weekly Issue #205 2nd February 2020

Articles

The Myth of the Blameless Retrospective

This article hints at the fact that blame and sanction (punishment) are two different things.
Bonus content: Dr. Richard Cook on blameless vs sanctionless retrospectives
Bob Reselman

  • The article title “The Myth of the Blameless Retrospective” is contrasted with “Blameless Postmortem” in the SRE book.
  • It says that “Newer companies such as Google, Etsy, and Airbnb are, in many ways, poster children for the Agile and DevOps sensibilities that champion the value of the blameless retrospective”.
  • And “But, most IT Departments live in old-school business sectors such as insurance, banking, defense, medicine, retail, entertainment (think Boston Red Sox and Landmark Cinemas), and manufacturing (Maytag, Mack Truck, and Pioneer Seed). These companies are sitting on a pile of legacy code and legacy processes, as well as a pretty entrenched legacy business culture”.
  • I guess it is a restraint for people who are blindly trying to adopt different cultures.

(A few) Ops Lessons We All Learn The Hard Way

Here we have a few lessons in operations that we all (eventually) (have to) learn; often the hard way.

Jan Schaumann

  • There are 88 operational lessons and those have deep meanings.

What are Service Level Objectives (SLOs)? Lessons Learned

I especially like the emphasis on reducing pager fatigue through thoughtfully selected SLOs.

Emily Arnott — Blameless

  • It mentions the merit of good SLO(Service-Level Objective)s.

Resilience Roundup — Four concepts for resilience and the implications for the future of resilience engineering

The four concepts, drawn from a paper by Dr. David Woods, are:

  • Rebound
  • Robustness
  • Graceful extensibility
  • Sustained adaptability

Thai Wood — Resilience Roundup

  • The article summarizes and explains Dr.Woods one.
  • The term “resilience” has come to mean different things in different contexts to different people.
  • It explains resiliency with the above 4 concepts.

How an Alleged “Space Strike” Beautifully Demonstrates Work-As-Imagined Versus Work-As-Done

Understanding the difference between work-as-imagined and work-as-done is critical to the reliability of a complex system.

Jaime Woo and Emil Stolarsky — The Morning Mind-Meld

  • It discusses the theme of Work-As-Imagined Versus Work-As-Done with an alleged “Space Strike”, 90 minutes silence occurred in space mission.
  • When I read it first, I could not get the sense of this story, but @inductor pointed out to me that in Japanese(If it makes no sense, it’s due to my bad translation…).
  • “To operate complex systems like that, you need to design a realistic way or the operation might collapse”, “Personally, I think it provides many important aspects to maintain reliability”. → After I got this comment I read again and got the context and understood better. Thank you!!

Tracking toil with SRE principles

There’s a useful survey in here if you’re trying to measure or track toil in your organization.

Eric Harvieux — Google

  • Introduction of the way of defining Toil along with SRE principles and the way of tracing it.
  • Don’t regard the technical and organizational complexity as Toils.

Site Wide Memory Leak: An On-Call Story

A nice little debugging story hinging on a bug in an upstream library.

Sanket Patel

  • The story starts from receiving alerts from different hosts frequently on some specific weekend.
  • He found the root cause of Memory Leak which triggered the alert of “Memory Usage Threshold Over” and was satisfied with solving the mystery.

Outages

Pinterest
Microsoft Office 365 Sharepoint Online
TD Bank
Google Drive, Docs, Sheets, and Slides
Facebook and Instagram
Gandi
They posted a quite candid analysis, concluding that they’re not sure what went wrong.

  • It picked the above companies outage information

KubeWeekly #202: February 7, 2020

The Headlines

Editor’s pick of the highlights from the past week.

Congratulations to the newest TOC members!

Please help us in welcoming the newest members of the CNCF TOC including Katie Gamanji, Liz Rice, Saad Ali, Sheng Liang, and Justin Cormack.

Announcing the containerd Project Journey Report

CNCF

CNCF just released the containerd Project Journey Report, the fourth such report issued for CNCF graduated projects. This report attempts to objectively assess the state of the containerd project and how CNCF has impacted the progress and growth of containerd.

  • My opinions and aspects are covered in the above comment.

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

How to Develop and Debug Python Applications in Kubernetes with Okteto

Ramiro Berrelleza, Okteto

  • It talked about how to develop and debug Python-based applications using Okteto on Kubernetes.
  • Oketeto was open-sourced in 2019. It also has an enterprise version. If you are interested in it you can find #Okteto channel in Kubernetes slack.

DNS Lookups in Kubernetes

Karan Sharma, Zerodha

  • It explained the structure of DNS in Kubernetes based on CoreDNS, the default DNS system in the current Kubernetes version.

Continuous Profiling Go applications running in Kubernetes

Gianluca Arbezzano, InfluxDB

A bit of Istio before tea-time

Alex Ellis, OpenFaaS

  • It introduces a very short example demo process for Istio using Public IP address and your own laptop before tea-time.

Latest Jepsen Results against etcd 3.4.3

Xiang Li, Alibaba Group

  • The research company, Jepsen conducted test and analysis of etcd 3.4.3 and the etcd community team received good outcomes and useful feedback from them.
  • If you want to check the overall report, you can click here.

Konveyor: Open Source, Migration Assistance for Kubernetes

Konveyor Project

  • It introduced the OSS Konveyor to migrate existing applications on Kubernetes.
  • It includes links of GitHub, Form, Slack, and Get Started. At that time, “Get Started” was incomplete.

Troubleshoot Kubernetes with the power of tmux and kubectl

Abhishek Tamrakar, Opensource.com

  • It introduces how to troubleshoot using kubectl and tmux.
  • It suggests Alias, if you want to use simple useful aliases for kubectl and the combination of its options, you can check this repository.

Load balancing and scaling long-lived connections in Kubernetes

Daniele Polencic, LearnK8s

  • It suggested how to scale and load balance the “long-lived connections” which are not offered on any built-in mechanism for them on Kubernetes.

Emit Datadog monitors based on Kubernetes state

Astro is an operator that emits Datadog monitors based on Kubernetes state.

  • The link of GitHub repository of Kubernetes Operator, Astro which simplifies Datadog monitor administration.
  • For more details, please check README.md.

The Editorial

Articles, announcements, and more that give you a high-level overview of challenges and features.

GitLab, with Marin Jankovski

Craig Box and Adam Glick, Kubernetes Podcast from Google

  • The episode of Weekly Kubernetes Podcast hosted by Community members belongs to Google.
  • Marin Jankovski, Engineering Manager of GitLab is the guest.
  • Many contents of “News of the week” are covered in KubeWeekly.

HPE acquires zero-trust networking, security firm Scytale

Charlie Osborne, ZDNet

  • The news of HPE(Hewlett Packard Enterprise) acquired Scytale.
  • You can see the release of HPE here.
  • Because I joined SPIFFE Meetup twice in Tokyo, I am interested in this news. zero-trust networking is the genre I want to understand more.

Run Windows Server Containers on GKE

Tim Anderson, The Register

  • It explained about the beta support of Windows server containers on GKE(at that moment), Config Connector, and analyzed Google’s aim with Kubernetes and surrounding tech companies.

Kubestone — Kubernetes & OpenShift performance benchmarking

Kubestone is a benchmarking Operator that can evaluate the performance of Kubernetes installations.

  • It introduced the operator, Kubestone, using for benchmarking performances of the installation on Kubernetes.
  • Due to the title, I expected Openshift will be mentioned with Kubernetes. Nevertheless, I could not find the story of Openshift in the body of this article and reference…

Kubernetes’ Inevitable Takeover of the Data Center

Scott Fulton III, DataCenter Knowledge

  • This series regarded Kubernetes as the most disruptive technology of IT that takes over Data Center and as a DCK(Data Center Knowledge), they analyzed carefully from the rise and past few years of the development of it.

If You’ve Got It, Flaunt It — Kubernetes Experience, That Is

Sydney Sawaya, SDxCentral

  • It explained the gap of the market between the demand of engineers with experience of Kubernetes and supply and then introduced the CNCF training menu briefly.

Kubernetes Operators: 4 facts to know
Kevin Casey, The Enterprisers Project

  • It says in the beginning, “Without real automation, you won’t realize the full potential of containers. That’s where Kubernetes Operators play a growing role” and list 4 facts of Kubernetes Operator to know as an IT leader.

Register Now: KubeCon + CloudNativeCon EU Day Zero Events

Kim McMahon, CNCF

  • It introduced co-host events of KubeCon + CloudNativeCon as Day Zero at that moment. Now It’s planned to be held as a virtual event between August 17th — 20th.

How Frame.io Built a Full Security Program Around Its Video Cloud with Falco

CNCF

  • As titled, it explained “How Frame.io Built a Full Security Program Around Its Video Cloud with Falco” for Netflix to Fox Sports and Vice, some of the most prominent creators of video and film content.
  • To see more details of the use case, check it here.

How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

Written by

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #AWS SAP, #National Tour Guide for English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store