- In this blog post series, I collect the following 3 Weekly Mailing List I subscribe to, leave some comments as an aide-memoire and useful links.
- Actually, I have already published the same content in my Japanese blog and am catching-up in English in this series.
- I hope it contributes to the people browsing this kind of information as a reference.
DEVOPS WEEKLY ISSUE #480 March 8th, 2020
- The title is “Bazel Performance in a CI Environment”.
- The story of Bazel performance improvement in GitLab CI environment on AWS instances.
- The build time has been reduced from 20 minutes to 1–2 minutes, and the cache can be used properly.
- The title is “Kubernetes Is Not Your Platform, It’s Just the Foundation @ QCon London, March 2020”.
- Presentation material of QCon London held from 3/2 to 3/6.
- It discusses it with “Getting Started with team-centric Kubernetes adoption” as a keyword.
- A collection of blogs that suggests how to write Kubernetes configs using the above programming languages and tools instead of YAML.
- Links are provided for each language, so please use the language you are interested in. C# and F# are the same page in the group of .NET.
- The title is “Build Operability In — Measures”.
- This article is Part 2 of a “Build Operability In”-themed series.
- Architecture/Telemetry/Operational Preparation/Building, Execution/Learning and Part 7 follow.
- Quoting from Douglas Hubbard’s “How to measure Anything”, “Organizations have measurement reversals and spend time measuring low informative variables,” this is certainly IT trust. It explains that you need to measure with an effective index, so it also includes a theme of “operability” as it applies to gender.
A neat visualisation of some of the things that someone wanting to move into system administration would likely want to learn. It does however focus squarely on the tools rather than wider devops issues.
- The title is “Dev Ops Roadmap”.
- A step-by-step guide diagram as a DevOps or other operational role.
- For example, multiple languages, products, protocols, etc. may be represented.
- The last point is “Keep Learning”. And the road continues.
A two part migration story, moving from EC2 on AWS, to Kubernetes on Google Cloud. Details of data, databases, moving from AWS ALB to Istio and more.
- The story of the migration from AWS to GCP, which was taken up on last week’s article (Kube Weekly #206: March 6th, 2020).
- The title is “5 ways to drive your automation engineers away.”
- A talented engineer with development and testing skills that can perform “test automation,” which is an essential element of the CI/CD pipeline, is too short to be hired even after several months.
- It’s a good idea to have as many good automation engineers as possible, so I’ll warn them not to step on the same path by showing the top five factors that leave automation companies.
- Page of iter8, a tool for automated canary release and A/B test on Kubernetes and Istio for cloud native development. A demo video of about 1 minute is easy to understand.
- Click here for the GitHub page .
- GitHub page of “cdk8s”, a tool to write Kubernetes configs using programming languages and tools other than YAML.
- Extends AWS CDK tools, supports Kubernetes, and can now write configs in Typescrypt or Python.
- Still an experimental project. It seems that the challenge and feedback are being praised.
- The “Status Bay”’s GitHub page that gives you visibility into the Kubernetes deployment process.
SRE Weekly Issue #210 March 8th, 2020
Netflix open sourced their incident management system.
Put simply, Dispatch is:
All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!
Kevin Glisson, Marc Vilanova, Forest Monsen — Netflix
- Introduction of OSS and contents of Netflix’s in-house tool “Dispatch” taken up last week’s article (DEVOPS WEEKLY ISSUE #479 March 1st, 2020).
I wasn’t aware of this little pitfall of memory cgroups.
- The response of “fork() can fail: this is important” in the last week’s article (SRE Weekly Issue #209 March 2nd, 2020) was bigger than she thought, so she wrote about other unexpected failures.
- About the possibility and mechanism that keeps hanging on reading /proc/pid/cmdline.
Your failover DB instance is cute. Try 4x+ redundancy. That’s the kind of engineering required when designing systems to operate in space.
Glenn Fleishman — Increment
- The past and future of redundancy design in NASA’s space and planet exploration missions.
This post enumerates some of the risks introduced when a single person carries 100% of the on-call duties of a team, and shows why those risks are not simply eliminated by increasing the number of people in the rotation.
Daniel Condomitti — FireHydrant
- It discusses the vulnerability of on-call single-person rotations to critical implications for the health of both recent incident response and long-term growth sustainability.
- One on call person in charge, each risk of one on duty rotation, “bystander effect” when alerts fly as a group etc. There are points to consider when operating a system that needs care on 24/365 , It is summarized in a simple and easy-to-understand manner, including examples of the company’s three-person system.
This is a pretty nifty experiment showing the importance of letting folks use their judgement to handle unexpected situations rather than relying on adherence to procedures.
Thai Wood — Resilience Roundup (summary)
Makoto Takahashi, Daisuke Karikawa, Genta Sawasato and Yoshitaka Hoshii — Tohoku University (original paper)
- Makoto Takahashi, Daisuke Karikawa, Genta Sawasato, Yoshitaka, who gave a presentation at the REA symposium last year as the 69th issue of the “Resilience Roundup”, which writes articles on the theme of Resilience on the Internet every week, is presented. The research of Hoshii is taken up.
- If you would like to participate in the discussion of this group, you can register here .
FYI: SRECon Americas West has been rescheduled to June 2–4.
- COVID-19 (new coronavirus infectious disease) caused 5 events sponsored by USENIX(at that moment). The event name, date after change, and place are as follows.
- SREcon20 Americas West: June 2–4 at the Hyatt Regency Santa Clara and the Santa Clara Convention Center in Santa Clara, CA, USA
- 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge ‘20): July 14 at the Sheraton Boston in Boston, MA, USA, now co-located with USENIX ATC ’20
- 2020 USENIX Conference on Operational Machine Learning (OpML ‘20): July 30 at the Hyatt Regency Santa Clara in Santa Clara, CA, USA
- SREcon20 Asia/Pacific: September 7–9 at the Sheraton Grand Sydney Hyde Park in Sydney, Australia
- 2020 USENIX Conference on Privacy Engineering Practice and Respect (PEPR ‘20): October 15–16 at the Hyatt Regency Santa Clara in Santa Clara, CA, USA
This week, we have another summary of the Physalia paper. I especially like the bit about poison pills.
Adrian Colyer — The Morning Paper (summary)
Brooker et al. — NSDI’20 (original paper)
- A series of random surveys of CS surveys by Adrian Colyer.
- From Marc Brooker, Tao Chen and Fan Ping of AWS announced at NSDI ’20 (SANTA CLARA, CA from February 25 to February 27) hosted by USENIX .
- There are slides , PDFs, and other materials, so you can choose which one you want to see by drawing and those who want to see by text, or you can check both.
- He is really deeply fond of the processes and engineering practices behind the design of Physalia, which stores configuration information as the EBS control plane for AWS.
- DB influence area = DB is divided and managed as a myriad of small cells in order to reduce the blast radius (Blast radius), and the client stored in each cell is designed not to be affected by the failure of another cell. Is taking
- When it comes time to deal with DB not only AWS, I definitely want to read back around here.
In this case, “proof” means “formal proof”.
It’s not that software got so reliable without proof: it’s that systems that include software got so reliable without proof.
- Quoting “How did software get so reliable without proof?” written by Turing Prize-winning scientist CAR Hoare in 1996, I sympathize with many explanations and points, but Hoare’s question remains in software Argues that it is a wrong question to look at from a larger perspective of the overall system, including software.
- Let’s Encrypt Status
- ->Let’s Encrypt purposefully suspended certificate issuance to investigate a bug around validating CAA DNS records. See their initial report and subsequent full report for details.
Subsequently, they decided to revoke 3 million certificates with a pretty short warning. Both actions (the revocations and taking down issuance initially) were likely warranted and mandated under the compliance guidelines that CAs are subjected to.
I’ve found two third-party incidents so far that seem to stem from the revocations:
Got any more? Please do send them my way.
- Robinhood (stock trading platform)
Thanks to Daniel Lucas for this and a couple other recent ones. for this one.
- G Suite Status Dashboard
- Interactive Brokers (Stock Broker)
- Binion’s and Four Queens (Las Vegas casinos)
→ Slot machines stopped working, and an eerie quiet descended.
- crates.io incident report for 2020–02–20
On 2020–02–20 at 21:28 UTC we received a report from a user of crates.io that their crate was not available on the index even after 10 minutes since the upload. This was a bug in the crates.io webapp exposed by a GitHub outage.
crates.io is the Rust language package registry.
Pietro Albini — crates.io
KubeWeekly #207: March 13th, 2020
Editor’s pick of the highlights from the past week.
Craig Box, Mandar Jog, John Plevyak, Louis Ryan, Piotr Sikora (Google), Yuval Kohavi, Scott Weiss (Solo.io)
Google has added dynamic extensibility to Envoy using WebAssembly, and developed an ABI called Proxy-Wasm to ensure that extensions compiled for one version can be used in another. This ABI can be adopted by other proxies, allowing Wasm extensions, initially written for Envoy, to work anywhere. The first use of this extensibility is in the new, lower-latency Istio telemetry system. An SDK (in three languages, with more to come) and an extension hub, built by Solo.io, rounds out the release.
- As Istio.Io, WebAssembly (Wasm) was introduced as the Alpha function to Envoy and Istio1.5.
- For those who want to catch up in Japanese, we recommend the YouTube video of GCPUG Istio 1.5 Day distributed on 3/12 (Thursday).
- Check out the WebAssembly Hub, which provides tools and repositories to build, deploy, share, and discover Envoy Proxy Wasm extensions for Envoy and Istio.
ICYMI: CNCF Webinars
Weekly recap of CNCF member and project webinars that you might have missed.
Timothy Gerla, CEO @Talos Systems
- Talos Systems CEO Timothy Gerla’s webinar video entitled Immutable Infrastructure in the Kubernetes era.
- Previously, he was the co-founder of Ansible and was the CTO when it was acquired from Red Hat.
- I’m worried that the Talos Systems logo is different in color from the Anthos logo.
Connor Gorman, Principal Engineer @StackRox
- StackRox Principal Engineer Connor Gorman ‘s Webinar video of Kubernetes Security Best Practices in DevOps.
- He proceeded with Q&A with the moderators. It also explained how to utilize RBAC, Namespace, Network Policy, etc. with demonstrations.
Cody Hill, Field CTO @Packet
- A webinar video about using OSS, bare metal, and 5G to achieve autonomous drone delivery by Packetar’s Field CTO, Cody Hill.
- Tools such as Kubernetes, Emitter , OpenFaaS , Prometheus, Grafana, PostgreSQL, Mapbox and Metabase are introduced and demonstrated.
- For the first time, I saw the block diagram including the drone. The world view that changes with technology is interesting.
Oliver Gould, Lead Creator of Linkerd and CTO @Buoyant
- Oliver Gould, CTO of Buoyant and Lead Creator of Linkerd, introduced Linkerd and explained the update information of version 2.7 and the roadmap of the future Webinar video.
Tutorials, tools, and more that take you on a deep dive into the code.
Mohamed Ahmed, Magalix
- The background behind the need for a service mesh is explained from the context of microservices, and the typical usage of Envoy and hands-on using Envoy are explained.
- It is interesting to compare the hand-drawn image diagram with the editor’s colorful diagram.
Peter Jausovec, Learn cloud-native
- Article for Meshery , the management plane for multi-service meshes. It provides lifecycle, configuration and performance management for service meshes and apps running on them.
- It is also expected to be used as a vendor and project neutral tool as a tool to benchmark the performance of different service meshes.
- The subtitle is “Learn what Kubernetes metrics to monitor, how to do it & what are the best open-source and commercial tools to help ensure peak performance of your cluster.”
- This article introduces the importance of monitoring Kubernetes, the critical metrics to follow, and monitoring tools that make your job easier.
Kubernetes deployment visibility like a pro. https://statusbay.io
- I skipped it because it was taken up in DEVOPS WEEKLY ISSUE #480 March 8th, 2020 above.
Daniel Messer & Chris Short, Red Hat
- The term Operator was coined by the CoreOS team in 2016 and touched on the trend that has become rapidly popular in the last two years, introducing the best practices issued by the Operator Framework Community and pointing out the points.
Adrian Cockcroft, AWS
- Hands-on introduction to OSS Crossplane and pipeline creation for Argo CD and AWS managed services. Crossplane supports AWS, GCP and Azure.
- It explains the background of the introduction of tools such as GitOps, Argo CD , and flux CD as the features and complexity of cloud-native infrastructure increase.
Balkrishna Pandey, goGlides
- The author saw a blog that sets NTP (Network Time Protocol) called “ RUNNING NTP IN A CONTAINER “ on Docker, so he tried using OpenNTPD on Kubernetes cluster.
Articles, announcements, and more that give you a high-level overview of challenges and features.
Justin Graham, DockHelping You and Your Development Team Build and Ship Fasterer
- Justin Graham, Vice President of Products from Docker, has a blog on his site.
- Touching on the complexity of the development environment, Docker will establish a project in the community centered on Docker Hub, Docker Desktop, OSS, contribute to other projects, and explain their first roadmap ever.
- Release information for Istio 1.5. The control plane of Istio becomes simple by consolidating it into a single binary Istiod.
- Since it is also taken up in The Headlines, other details are omitted.
- To understand the background and points, I highly recommend the YouTube video of GCPUG Istio 1.5 Day distributed on 3/12 (Thursday).
Ray O’Farrell, VMware
- VMware Tanzu announced at VMworld US 6 months ago, and it became GA.
- Rather than “using Kubernetes in a VMware environment”, I see it as one of the moves to bring declarative management as a platform to the entire infrastructure.
Adam Glick and Craig Box, Kubernetes Podcast from Google
- Guest is Richard Belleville , software engineer of Google.
- The YouTube video “ gRPC Basics meetup video “ that introduces Richard Belleville about gRPC looks good.
- News of the week is Kube Week Managed Kubernetes pricing comparison , HPE Container Platform is Generally Available , Sidecar containers not in 1.19 after all.
Tim Anderson, The Register
- A commentary on what Monzo announced at QCon London, which was held from March 2 to March 6.
Jevon MacDonald, manifold via The New Stack
- Introducing the New Stack of OSS Open Policy Agent (OPA) that controls policies.
- This is a presentation video and commentary of OPA Summit 2019 by software engineer William Fu of Pinterest and Luke Massa of TripAdvisor.
- It introduces that security vendors are interested in OPA and are expected to unify disparate policy control among different systems.
Lucian Constantin, CSO
- A story about Visa developing in-house container security using OSS while moving from a legacy monolithic application to a microservice application.
- I couldn’t solve my problem with the vendor product, so I decided to make it in-house.
- Once I decided to make it in-house, the operation, development, and security teams got closer together, and they became more supportive of each other, and headed for DevSecOps.
Bob Violino, CIO
- Three case studies (Expedia Group, Primerica, Clemson University) that have successfully migrated into the cloud with containers and Kubernetes.
Ran Ribenzaft, DevOps.com
- Introducing 6 OSSs (Prometheus, Grafana, Elastic Stack, Sensu Go, Sysdig Inspect, Jaeger) as tools to monitor and analyze Kubernetes and containers. Personally, Sensu GO was unmarked.
- Since the author has published a comparison table, you can consider the tool according to your application. It is a mystery that the Epsagon that I have not touched on in the text is at the right end as the seventh and it says “All good!”
John Edwards, NetworkWorld
- The network also evolves rapidly in the context of containers, and it is becoming new.
- It touches and explains the keywords such as Bridge networks, Overlay networks, host network, Macvlan network, NFV (network function virtualization) , BPF (Berkeley Packet Filters) , and Istio.
Upcoming CNCF webinars
You can check some Recorded Webinars and Upcoming Webinars here. The following are posted as Upcoming CNCF webinars at that moment.
Calico networking with eBPF
Chris Hoge, Developer Advocate @Tigera
Shaun Crampton, Principal Engineer @Tigera
March 17, 2020 10:00 AM Pacific Time
Democratizing analytics with cloud native data warehouses on Kubernetes
Robert Hodges, CEO @Altinity
Vladislav Klimenko, Senior Software Engineer @Altinity
March 18, 2020 10:00 AM Pacific Time
Small Is Not Always Beautiful — Moving Enterprise Applications to the Cloud
Paul Jenkins, Product Manager @Oracle Cloud Infrastructure (OCI) Cloud Native Services
Tony Vertenten, co-founder and CTO @Intris
March 19, 2020 9:00 AM Pacific Time
How to migrate a MySQL Database to Vitess
Liz van Dijk, @PlanetScale
March 20, 2020 10:00 AM Pacific Time
Argo CD, Flux CD and the GitOps Revolution
Jay Pipes Principal, Open Source Engineer @Amazon Web Services
March 24, 2020 10:00 AM Pacific Time
Lowering the Barrier to Kubernetes Proficiency — Navigating the Stormy Seas of Information
Chris Black, Sr. Solutions Engineer @CircleCI
March 25, 2020 10:00 AM Pacific Time
Continuous profiling Go application running in Kubernetes
Gianluca Arbezzano, Site reliability engineer @InfluxData
March 27, 2020 10:00 AM Pacific Time
Welcome to CloudLand! An Illustrated Intro to the Cloud Native Landscape
Kaslin Fields, Developer Advocate @Google
April 3, 2020 10:00 AM Pacific Time
Best Practices for Deploying a Service Mesh in Production: From Technology to Teams
April 8, 2020 10:00 AM Pacific Time
April 23, 2020 9:00 AM Pacific Time
Pivoting Your Pipeline from Legacy to Cloud Native
Tracy Ragan, CEO of DeployHub and CDF Board Member
June 30, 2020 10:00 AM Pacific Time
How about those articles? Do you have any interest in any?
Actually, I have some contents which I can not digest at this stage, I’ll make use of this aide-memoire and links for catching-up for myself too.