- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #520 December 13th, 2020
Databases have limits that if you build a popular service and run it for a long time you’ll undoubtedly hit and need to plan for. This post talks about one such case, migrating a single table with 70 billion records and growing at more than 100 million rows a week.
- The title is “The Boring Option”.
- Introducing a migration case study by the Strava Foundation Engineering team.
- They recently completed a multi-year project to handle the rollover of activity IDs into a 64 bit integer format. This migration work was completed in the final year-long project of to move read and write access of segment effort data from our monolithic Rails app to behind a service, a necessary prerequisite to allow us to safely and easily to change the underlying segment efforts storage system to be compatible with 64 bit activity ID values.
- The title is “A QUICK INTRODUCTION TO SOFTWARE BILL OF MATERIALS AND CYCLONE DX”.
- At the beginning, he mentions the complexity of the open source supply chain, cites the meaning of “bill of materials (BOM)” on Wikipedia below, and explains the software bill of materials (SBOM). It’s not new as an idea, but there are some active industries that are of interest.
○ A list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product.
- Introducing “CYCLONE DX” as one of the projects working on creating a standard that covers all of the different languages and domains that make up a typical application. CYCLONE DX is a project designed as part of the work of OWASP Dependency-Track, which provides both XML and JSON schemas, defining formats for describing simple and complex configurations of software components.
Lots of teams have small home-grown monitoring services that sometimes see less testing and automation than the services they monitor. Sometimes changes to those services can lead to unexpected downside like with this interesting incident report.
- The title is “It’s Just a Monitoring Change”.
- Admonishing the light word in the title, it explains with the actual examples that “even though it might be just a monitoring change, it can still take down your primary database and render your products unusable.”.
Large organisations are rapidly changing how they work, adopting lots of devops practices and better integrating previously separate business units. This post summarises some of that towards a new operating model.
- The title is “The New Operating Model Is Upon Us”.
- A commentary by the author who has been working on major research projects on “new operating models”. The organizations they discussed had PMO-led annual portfolio governance and the following common challenges:
○ How to balance product team autonomy with the need for specialist expertise
○ How to ensure necessary coordination without flow-inhibiting bureaucracy
○ How to engage with organizational risk management professionals and their valid concerns
○ How to structure product-team-based organizations (in particular, the pros and cons of the so-called “Spotify model”)
○ The evolution of functional silos (think I&O) into true internal “product teams” (increasingly called “platform teams”)
Terraform, or other infrastructure as code tools, provide a programming language. But how often do we apply patterns and practices learned from other programming languages? This post takes us through a nice refactoring exercise to make the point.
- The title is “Infrastructure-as-code-as-Software”.
- Using software engineering principles, we communicate that making Infrastructure as code more robust and reliable is a learning skill, and explain the following eight steps to help you learn it.
○ 1/8 📒Readability
○ 2/8 🍴Separation of Concern
○ 3/8 🔗Loose Coupling
○ 4/8 🤼 Conway’s law and Service Oriented Design
○ 5/8 📡Inter Layer Communication
○ 6/8 ♻️Reusability and Abstraction
○ 7/8 📄Static Code Checking
○ 8/8 🔒 Dependency Locking
- The title is “The Role-Profiles Pattern Across Infrastructure as Code”.
- It starts with how Puppet Labs has used the Role-Profiles pattern to help with many complex deployments, and describes how to use Terraform and Terragrunt .
How do we relate conversations about digital transformation at the business level to devops practices and to agile software development? This post takes a run at providing an answer, and discusses why this is relevant to leaders at different levels of an organisation.
- The title is “Accelerating Digital Transformation: What Every CEO Needs to Know About Software Delivery Automation”.
- It is defined as “The DT (Digital Transformation) becomes the all-encompassing term used by CIOs and C-level execs for any sort of technological evolution.”, Executives (C-level execs or CxO) understand the software functions of the organization. The explanation focuses on the following four factors that should be considered when trying.
- Software delivery success ultimately depends on decisions made by the CEO of an organization
- Software metrics don’t focus on business results
- Organizational silos slow down processes
- Operational excellence is key.
- The title is “Kubernetes RBAC Security Pitfalls”.
- Here are some common mistakes and vulnerabilities you might want to know when designing, configuring, or auditing Kubernetes authentication. The following is a disclaimer.
○ This post is not a complete guide to Kubernetes or RBAC security and only covers a few specific aspects.
- The title is “containerd development with multipass”.
- When he started a project to develop directly for containerd 18 months ago, Docker and Kubernetes on Mac were not enough and he needed a Linux environment, so he started using contained and Multipass as the title suggests. It explains the background and method.
Workplace culture often gets relegated because it’s so intangible, but it will make or break your Cloud Native transformation. Join Holly Cummins and Jamie Dobson for insights, conversations and of course, industry gossip. Sign up for Container Solutions’ last WhatTheFinar of the year: Tuesday 15th Dec, 11am CET.
- The event “WTF Is Cloud Native Culture?” By Container Solutions was featured.
- This time, they had Holly Cummins from IBM as a guest. In the introduction, the title was “Worldwide Development Leader, IBM Garage”, but as far as I could see from LinkedIn’s profile , it had been from this month → “Innovation Leader, Corporate Strategy SPEED”.
- As mentioned above, a 90-minute course was held. 12/15 (Tue) 11:00 CET (Central European Time zone).
- The GitHub page of the Domain Specific Language (DSL) project “Bicep” that deploys Azure resources declaratively.
- There is a note below at the beginning, so be careful when using it and pay attention to trends.
○ Note: Bicep is currently an experimental language and we expect to ship breaking changes in future releases. It is not yet recommended for production usage. Please take a look at the known limitations prior to opening any issues.
- A transparent abstraction to ARM and ARM templates, everything you can do with ARM templates can be done with Bicep ( except for temporary known restrictions ).
- The GitHub page of “localizer”, a new CLI tool for plain local development for developer environments using Kubernetes.
- It is developed according to the premise and issues of “Tools such as; Telepresence, Skaffold, and others all attempt to solve the problem of getting users used to using Kubernetes. This is a pretty big task given that Kubernetes has a gigantic surface area. From my experience (keyword: my experience), developers have no interest in what platform they are deploying to.”
- It is the Dig tool for admin for AWS. The “digaws” GitHub page where you can dig AWS-owned IP addresses, regions and other information displayed.
- As an example, the EU region where Netflix EC2 is running.
SRE Weekly Issue #248 December 13th, 2020
It’s really easy to get an “uptime” SLO wrong, and a lying SLO can give you a false sense of security.
Piyush Verma — Last9
- As an option to measure the service down time Prometheus , Operations (formerly known as Stackdriver) commentary touched on, such as three of the following as options for measuring the service down time.
○ Option 1: SDK (Measure at each caller)
○ Option 2: Uptime (actually Downtime) Monitors
○ Option 3: State-based Monitors
- The conclusion is below.
○ There is not one single SLO. They are formed at layers, and uptime SLO of one could be error SLO of another.
○ The uptime number is massively aggregated, and always approximate.
○ As the uptime reaches the higher 9s, the support structure and the mindset needs to shift towards proactive efforts, since waiting on an outage and then reacting to bring it up will not always work.
I love this quote. I feel like this is the “root cause” of every incident:
As for the underlying cause of the incident (or the “root cause” if you insist on using such language), that has to be the fact that our assumptions as teams or individuals are ultimately formed by our past experiences.
Oliver Leaver-Smith — Sky Betting & Gaming
- Since it was mentioned in DEVOPS WEEKLY ISSUE # 520 above, I will skip it here.
I really love the concept of requisite complexity. This article has me thinking about a big project I’m working on in a new light.
- Since it was mentioned in the previous blog SRE / DevOps / Kubernetes Weekly Collection#14(Week 19), I will skip it here.
They expected to max out an integer primary key column sometime in 2021. Then the pandemic hit and their timetable suddenly accelerated along with their traffic.
Jeff Pollard — Awesome
- I will skip it because it is covered in DEVOPS WEEKLY ISSUE # 520 above.
I shouldn’t enjoy reading these so much… got any of your own to share?
- Gremlin’s recent Twitter hashtag challenge called “#talesfromtheNOC” shares a story that invited people to share the story of a scary sysadmin.
The idea of borrowing expertise makes me think of Bainbridge’s Ironies of Automation.
Bath Walls — PagerDuty
- According to the title, it is explained in the following four items.
○ What Is Runbook Automation?
○ Borrow Expertise From Your Experts
○ The Benefits of Automation
○ Learn More About Runbooks and Automation
Heroku’s report explains how their service was impacted as a result of the big Amazon Kinesis outage a couple weeks back.
- As mentioned above, the failure of Kinesis, a service provided by the upstream provider (AWS), affected Heroku users, but Heroku thought that it should not be so, and a remedy (not this manual recovery) , Self-healing recovery plan) is presented.
This primer focuses on ensuring that your SLOs actually match up with business objectives.
Irving Popovetsky — Honeycomb
- At the beginning, it mentioned that it is the “season for setting goals for 2021”, and a practical example of how to use (and not use) SLO when setting future annual goals are explained along with the following points.
○ Aligning business goals and engineering work
○ The common language of SLOs
○ Getting started with SLOs in the real world
○ Gathering data for your own SLOs
○ Setting your company goals this year
An interesting Twitter thread about a router near San Francisco, California, USA that was flipping bits in packets for weeks. Folks took to Twitter to try to get AT&T’s attention, and they finally fixed it.
- Facebook Messenger & Instagram
- Microsoft stuff
- Office 365
KubeWeekly #244 December 18th, 2020
Editor’s pick of the highlights from the past week.
Today, CNCF announced that Google Cloud has recommitted $3 million for another year in cloud credits to maintain its support of the Kubernetes project. This grant is a continuation of Google Cloud’s $3 million per year investment in Kubernetes development and distribution, which started back in 2018. The grant has primarily gone to — and will continue to support — scalability testing and maintenance of the infrastructure required to run Kubernetes development, which is indispensable for ensuring Kubernetes remains battle-tested and enterprise-ready.
- The news of Google’s continuous investment in CNCF for $ 3 million annually.
- As mentioned in the article, there was a case of Istio going to Open Usage Commons, and it was worried about the continuation of committing to CNCF, but this time continued. What will happen in the future?
- I feel the presence of AWS as a DIAMOND sponsor of KubeCon NA, and together with the re: Invent of the company’s event that is being held. I am looking forward to what will happen next year.
ICYMI: CNCF Webinars
You can view all CNCF recorded and upcoming webinars here.
Webb Brown, CEO @Kubecost Niko Kovacevic, Founding Engineer @Kubecost
- The Kubecost team provides hands-on examples and best practices for reducing spending without sacrificing performance or reliability.
Mason Choi(Moonhyuk Choi), Senior Engineer @Samsung SDS Kangsub Song(Kangseop Song), Senior Engineer @Samsung SDS
- There is a description that “This webinar has passed.”, And there is no video. English slides can be viewed on the linked page, so check if you are interested.
Al Kemner, Principal Software Engineer and Architect @New Relic Daniel Jimbel, Staff Engineer @New Relic Caleb Troughton, Product Manager, Telemetry Data Platform @NewRelic
- It explains how to use ArgoCD and Rollouts to manage the canary deployment process in Kubernetes.
Larry Lancaster, Founder and CTO @Zebrium
- It describes machine learning techniques for logs and metrics and shows what they actually look like.
Tutorials, tools, and more that take you on a deep dive into the code.
- It outlines Red Hat Actions, outlines some of the workflows that can be simplified, and shows how to get your application up and running quickly and easily with OpenShift.
Priyanka Jiandani, Red Hat
- It demonstrates how to deploy a stateful application using Kubernetes Operator.
- The Operator uses the operator-sdk project to deploy WordPress to SQL with custom resources.
Mike Calizo, Red Hat
- The following is explained along with the title.
○ What are resource quotas?
○ Set up a resource quota
○ Deploy the pods
○ Clean up
○ Planning your quotas
K8Spin Operator: Kubernetes multi-tenant operator. Enables multi-tenant capabilities in your Kubernetes Cluster.
- Kubernetes multi-tenant operator “K8 Spin Operator” GitHub page. Feature is below.
○ Enable Multi-Tenant: Adds three new hierarchy concepts (Organizations, Tenants, and Spaces).
○ Secure and scalable cluster management delegation: Cluster Admins creates Organizations then delegating its access to users and groups.
○ Cluster budget management: Assigning resources in the organization definition makes it possible to understand how many resources are allocated to a user, team, or the whole company.
Saiyam Pathak, Civo
- The YoutTube video explains the following with the following four points.
○ Motivation and vision behind Portainer
○ We explore Portainer Features and how can we use it
○ We discuss both CE and BE edition
○ We also talk about the community involvement
- It aims to help readers understand how easy it is to connect services running in multiple isolated Kubernetes clusters that are distributed across cloud providers or running on-premises.
Articles, announcements, and morethatgive you a high-level overview of challenges and features.
Adam Glick and Craig Box, Kubernetes Podcast from Google
- The Kubernetes Podcast by Google employees. The current Co-hosts are Craig Box and Adam Glick.
- They have Kate Goldenring, the software engineer of Microsoft’s Edge OS team and the maintainer of “Akri”, as a guest. Akri is an open source project that manages edge devices.
- The topics I was interested in in the News of the week are as follows.
○ Anthos for Telecom puts Google partners apps on the edge
○ New Microsoft AKS features
○ Cross-region replication in AWS ECR
Langdon White, Chris Short, and Matt Micene, Red Hat
- A YouTube video that the above three people explain according to the title.
Kendall Miller, Fairwinds
- They have Tabitha Sable as a guest and have an interview-style question and answer session.
Catherine Paganini and Jason Morgan, Buoyant
- A series of articles focusing on explaining each category of Cloud Native Landscape to non-technical readers and engineers who are just starting out with cloud native.
- This time, it explains in the following major items.
○ Orchestration and Scheduling
○ Coordination and Service Discovery
○ Remote Procedure Call
○ Service Proxy
○ Gateway API
○ Service Mesh
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Thanks again for participating in CNCF webinars in 2020! Stay tuned for our expanded Online Programs in 2021.
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.