SRE / DevOps / Kubernetes Weekly Collection#48(Week 53)

  • In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
  • Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
  • I hope it contributes to the people browsing this kind of information as a reference.

DEVOPS WEEKLY ISSUE #522 December 27th, 2020


  • The title is “How to sell SLOs to Engineering Directors”.
  • It shares redacted internal memo aimed to familiarize SLOs to listeners and explain the value of SLO culture and how we would implement and roll them out.
  • The title is “How Shopify Uses WebAssembly Outside of the Browser”.
  • It explains how Shopify chose WebAssembly, a universal format that guarantees it’s performant, secure, and flexible as following with the perspectives of the security / performance / flexibility / community-driven, architecture, and more.
    ○ We want Partners to focus on using their domain knowledge to solve problems, and not on managing scalable web services. To make this a reality we’re keeping the flexibility of untrusted Partner code, but executing it on our own infrastructure. We choose a universal format for that code that ensures it’s performant, secure, and flexible: WebAssembly.
  • The title is “Why I’ve Been Merging Microservices Back Into The Monolith At InVision”.
  • A story about integrating a legacy service that was a microservice to which he belonged into a monolith and resizing it.
    ○ “I am not anti-microservices.This quest is intended to “right size” the monolith. What I am doing is solving a pain-point for my team.” he pointed out in the beginning as follows.
  • “To be very clear, I wanted to start this post off by stating unequivocally that I am not anti-microservices. My merging of services back into the monolith is not some crusade to get microservices out of my life. This quest is intended to “right size” the monolith. What I am doing is solving a pain-point for my team. If it weren’t reducing friction, I wouldn’t spend so much time (and opportunity cost) lifting, shifting, and refactoring old code”.
  • It was very helpful to see the problems that microservices solve, how the company introduced it, and if it were to be redone.
    ○ “In short, all the benefits of Conway’s Law for the organization have become liabilities over time for my “legacy” team. And so, we’ve been trying to “right size” our domain of responsibility, bringing balance back to Conway’s Law. Or, in other words, we’re trying to alter our service boundaries to match our team boundary”. Which means, merging microservices back into the monolith.”
    ○ “A far more helpful term would have been, “right sized”. Microservices were never intended to be “small services”, they were intended to be “right sized services.””
    ○ “For my team, “right sized” means fewer repositories, fewer deployment queues, fewer languages, and fewer operational dashboards. For my rather small team, “right sized” is more about “People” than it is about “Technology”. So, in the same way that InVision originally introduced microservices to solve “People problems”, my team is now destroying those very same microservices in order to solve “People problems”.”
  • The title is “Uber’s Real-Time Push Platform”.
  • Uber built its own app experience by migrating from polling to a gRPC-based bi-directional streaming protocol to update apps.
  • I saw the item “Eliminating polling, introducing RAMEN” twice. RAMEN is an abbreviation for Realtime Asynchronous MEssaging Network.
  • I was hungry when I saw the words “RAMEN Server” and “Scaling RAMEN globally”. Is UberEats delivering RAMEN noodles for me?
  • The title is “How GitOps Improves the Security of Your Development Pipelines”.
  • An outline article of the session of the virtual event “GitOps Days 2020”. The YouTube video of the session is embedded.
  • It explains GitOps with the following three points, saying that you can control changes and see changes from a single source.
  1. Config as Code
  2. Changes are auditable
  3. Production matches the desired state kept in Git
  • The title is “Compiling Qt with Docker multi-stage and multi-platform”.
  • An article about building the cross-platform development framework Qt into multi-stage and multi-platform using Docker.
  • For embedded devices, compiling is not easy, and compiling Qt(and QtWebEngine) is a very heavy operation. Therefore, precompile and distribute Qt so that the Dockerfile is downloaded and included in the build process(rather than compiling as part of the installation process).
  • The title is “Open Telemetry Java: All you need to know”.
  • As a tutorial, it explains how to attach the OpenTelemetry Java Agent, Trace methods, Span methods, and so on.
  • Click here for the GitHub page.


  • The GitHub page of Tobs(The Observability Stack for Kubernetes), a tool that makes it as easy as possible to install a fully observable stack on a Kubernetes cluster.
  • It provides CLI tools that make deployment and operation easy, and also provides helm charts that can be used directly or as subcharts for other projects.
  • As mentioned above, the GitHub page of the OSS tool “Singer” for ETL. It sends data between databases, web APIs, files, queues, and anything else you can think of.
  • Click here for the GitHub page.
  • As the name implies, the GitHub page of “grafana-sync”, a tool for synchronizing Grafana dashboards.

SRE Weekly Issue #250 December 27th, 2020


  • A Post-mortem of disability on 2020/05/03, dated 020/05/05 of Salt.
  • Salt’s configuration management vulnerability “CVE-2020–11651” could attack Algolia’s infrastructure, allowing two types of malware code to infiltrate Algolia’s configuration manager.
  • This article is for those who are interviewed for employment as an SRE, and is divided into the following three items. I think that the items/contents will be helpful for the employer as well.
    ○ What is a site reliability engineer? (SRE)
    ○ Primary roles and responsibilities of an SRE
    ○ Questions to expect in a site reliability engineer interview
  • As the title suggests, each of the six CTOs features a Halloween and shares their experience of their outages to better prevent them from happening in the future.
  • It shares the history, response details, and knowledge of the failure of RDS failover to Multi-AZ that the company experienced in 2019.
  • As mentioned above, it’s a comic-style article. I felt that it was necessary for the SRE itself and others to understand the role and be careful not to take personal and ad hoc measures so that SREs could “engineering” and “observing” their platforms.
    ○ “SREs should’ve been engineering and observing the bridge, but instead they became the bridge.”
  • As the title suggests, they investigated the 502 error that was occasionally returned for API requests. The cause was that the TCP backlog length was set to 1 instead of the default value, 128.
  • The editor picked the post-analysis mentioned in Outages last week, since it updated the following point.
    ○ “The following is a correction to the previously posted ISSUE SUMMARY, which after further research we determined needed an amendment. All services that require sign-in via a Google Account were affected with varying impact. Some operations with Cloud service accounts experienced elevated error rates on requests to the following endpoints: or Impact varied based on the Cloud Service and service account. Please open a support case if you were impacted and have further questions.”

KubeWeekly # 245 ← No Updates



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yoshiki Fujiwara

・Cloud Solutions Architect - AWS@NetApp in Tokyo, Japan. #AWS Certified Solution Architect&DevOps Professional, #Kubernetes, ・Opinions are my own.