SRE / DevOps / Kubernetes Weekly Collection#43(Week 48)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.

DEVOPS WEEKLY ISSUE #517 November 22nd, 2020
SRE Weekly Issue #245 November 22nd, 2020
KubeWeekly #242 ←No Updates

DEVOPS WEEKLY ISSUE #517 November 22nd, 2020

News

Workarounds might be inevitable in complex people and computer systems, how can we identify and remove the need for workarounds that lead to security problems?

A presentation stepping through improving application security by supporting developers with security expertise and services. Secure API design, thread modelling and more good tips.

  • The title is “DevSecOops — Stories of DevSecOps Failures and Success”.

A long post telling the story of a long-running migration to Kubernetes. Lots of details about the why, monitoring, automation, governance and more.

  • The title is “Learnings From Two Years of Kubernetes in Production”.

Some observations from last week’s KubeCon event, looking at the content presented and what it might mean about the maturity of the community and project.

  • The title is “KubeCon 2020 Recap — Maturity in Cloud Native”.

Distributed systems have lots of interesting properties that warrant detailed study, and this up-to-date set of course material (notes slides and videos) is a great start for anyone seeking more in-depth knowledge.

  • The title is “New courses on distributed systems and elliptic curve cryptography”.

An interesting look at the scale of the growing BPF ecosystem. Lots of tools at lots of different layers of the stack, with a focus on Kubernetes use cases.

  • The title is “Beyond the buzzword: BPF’s unexpected role in Kubernetes”.

A new report on container adoption. Lots of interesting aggregate data on cloud provider Kubernetes adoption, popular stateful container applications, container registry usage and more.

  • The title is “11 FACTS ABOUT REAL-WORLD CONTAINER USE”.
  1. Kubernetes runs in half of container environments

A 12 part series on running.NET applications on Kubernetes. Everything from Helm charts to health checks and database migrations to rolling deployments.

  • The title is “Series: Deploying ASP.NET Core applications to Kubernetes”.

Tools

Ever found yourself wanting to quickly look up DNS information? Dog is a CLI tool with a nice user interface and the ability to output JSON.

  • A web page of the OSS DNS client CLI tool “dog”. It is colorful and easy to see.

illuminatio is a tool for automatically testing kubernetes network policies. Simply execute illuminatio clean run and illuminatio will scan your kubernetes cluster for network policies, build test cases accordingly and execute them to determine if the policies are in effect.

  • The GitHub page of “illuminatio”, a tool that automatically tests Kubernetes’ Network Policy.

Karpenter is a metrics-driven autoscaler built for Kubernetes and can run in any Kubernetes cluster anywhere. It’s performant, extensible, and can autoscale anything that implements the Kubernetes scale subresource

  • The GitHub page of Karpenter, a metric-driven autoscaler built for Kubernetes that can be run anywhere in any Kubernetes cluster. Still in the developer preview stage. I feel like I have covered it before, but I cannot find it.

SRE Weekly Issue #245 November 22nd, 2020

Articles

Trust Asia 2021 has produced inconsistent STHs

A Certificate Transparency (CT) log failed, resulting in its permanent retirement. The incident involved unintended effects from load testing being performed in a staging environment. I have a huge amount of admiration and respect for the transparency of certification authorities (CAs) when things go wrong.

Trust Asia

  • The communication of Google group “Certificate Transparency Policy” that shows the flow of investigation results and future countermeasures from inquiries when a CA Trust Asia failure occurs.

Knowing your systems and how they can fail: Twilio and AWS talk at Chaos Conf 2020

I like the idea that adding the ability to fail over to your system makes it much more complicated and thus more likely to fail.

Andre Newman — Gremlin

  • Two presentations at Chaos Conf 2020 are taken up and explained. The presentations are embedded in the web page.

Building for reliability at HelloSign

This one introduces some interesting concepts: the error kernel and property testing.

Kenneth Cross — HelloSign

  • The products by HelloSign and the concept are introduced according to the following items.
  1. Kernel panic!

Tech Startup Dilemmas: Resilient Deployment vs. Exhaustive Tests

[…] to be resilient, we must test everything, which consumes time that we don’t spend innovating. A good trade-off is to test in production.

Xavier Grand — Algolia

  • The title explains the need to find the right balance between resilience and innovation in order to respond to market changes and new needs.

8 Tips to Create an Accurate and Helpful Post-Mortem Incident Report

More useful tips as you develop your post-incident analysis process. I like their definition of “blameless”.

Zachary Flower — Splunk

  • As the title suggests, the following eight tips are explained.
  1. Don’t assign blame

Achieving exactly-once message processing with Ably

Exactly once delivery is hard to implement and requires explicit coordination at all levels, including the client. Ably explains how their flavor works.

Paddy Byers — Ably

  • An article that clarifies the meaning of “exactly-once” in the context of distributed pub/sub systems and aims to understand the meaning of “exactly-once” guarantees provided by Ably.

Why you should frequently turn down ~30% of canary instances

The most effective (if scary) way to understand how your stateless service operates under load

Utsav Shah — Software at Scale

  • It explains possible approaches when it is unclear how to understand the limitations of service scalability.

The Engineer’s Guide to Preparing for Black Friday 2020

Some good tips here — and a reminder that we may see even more traffic than normal due to social distancing.

  • Regarding Black Friday at the beginning, it is dangerous for people to flock to the store in the presence of COVID-19, and since it has become a digital event in the past few years, the movement is expected to accelerate this year as well.

Outages

KubeWeekly #242 ←No Updates

How about those articles? Do you have any interest in any?

Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.

Bye now!!

Yoshiki Fujiwara

An infra engineer in Tokyo, Japan. Grew up in Athens, Greece(1986–1992). #Network, #Kubernetes, #GCP, #Certified AWS SAP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store