From 268ad0876bf96dd17887dfdc14bb83af0ad22254 Mon Sep 17 00:00:00 2001 From: wenzdey <56051809+wenzdey@users.noreply.github.com> Date: Sat, 1 Oct 2022 20:53:09 -0500 Subject: [PATCH 1/7] Removed dead links --- README.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 093986f..d9e13b7 100644 --- a/README.md +++ b/README.md @@ -219,12 +219,6 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools ### Blog Posts -* [A Deep Dive into the Recent BCH Hard Fork Incident](https://blog.coinbase.com/a-deep-dive-into-the-recent-bch-hard-fork-incident-2ee14132f435) -* [Blockchain Infrastructure at Coinbase](https://blog.coinbase.com/blockchain-infrastructure-at-coinbase-366c09dbcef4) -* [Logs, metrics, and the evolution of observability at Coinbase](https://blog.coinbase.com/logs-metrics-and-the-evolution-of-observability-at-coinbase-13196b15edb7) -* [Reliability Engineering at Coinbase](https://blog.coinbase.com/reliability-engineering-at-coinbase-8b6956ba802f) -* [Introducing Salus: How Coinbase scales security automation](https://blog.coinbase.com/introducing-salus-how-coinbase-scales-security-automation-1ba5e8074937) -* [How Coinbase Builds Secure Infrastructure To Store Bitcoin In The Cloud](https://blog.coinbase.com/how-coinbase-builds-secure-infrastructure-to-store-bitcoin-in-the-cloud-30a6504e40ba) * [Open Sourcing Coinbase’s Secure Deployment Pipeline](https://blog.coinbase.com/open-sourcing-coinbases-secure-deployment-pipeline-ae6c78e25517) @@ -1219,9 +1213,6 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [Ɔhaos Ǝnginǝǝring @ Target - Part 2](https://tech.target.com/2019/05/09/chaos-engineering-at-Target.html) * [Ɔhaos Ǝnginǝǝring @ Target - Part 1](https://tech.target.com/2019/02/05/chaos-engineering-at-Target.html) * [GoAlert - Your Future Open Source, On-Call Notification Product](https://tech.target.com/2019/02/25/introducing-goalert.html) -* [On Infrastructure at Scale: A Cascading Failure of Distributed Systems](https://tech.target.com/2019/01/14/cascading-failure-of-distributed-systems.html) -* [Distributed Troubleshooting](https://tech.target.com/2017/04/05/distributed-troubleshooting.html) -* [Outage Resolution Through Automation](https://tech.target.com/2014/12/29/outage-resolution-through-automation.html) @@ -1447,9 +1438,9 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [Building Secure & Reliable Systems](https://www.oreilly.com/library/view/building-secure-and/9781492083115/) | [Read free online version hosted by Google](https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf) * [Site Reliability Engineering](https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/) | [Read free online version hosted by Google](https://sre.google/sre-book/table-of-contents/) * [The Site Reliability Workbook from Google](https://www.oreilly.com/library/view/the-site-reliability/9781492029496/) | [Read free online version hosted by Google](https://sre.google/workbook/table-of-contents/) -* [Training Site Reliability Engineers](https://www.oreilly.com/library/view/training-site-reliability/9781492076018/) | [Read free online version hosted by Google](https://static.googleusercontent.com/media/sre.google/en//static/pdf/training-sre.pdf) +* [Training Site Reliability Engineers](https://www.oreilly.com/library/view/training-site-reliability/9781492076018/) * [97 Things Every SRE Should Know](https://www.oreilly.com/library/view/97-things-every/9781492081487/) | [Complimentary Copy from Nginx](https://www.nginx.com/resources/library/97-things-every-sre-should-know/) -* [SLO Adoption and Usage in Site Reliability Engineering](https://www.oreilly.com/library/view/slo-adoption-and/9781492075370/) | [Read free online version hosted by Google](https://sre.google/static/pdf/slo-adoption-and-usage-in-sre.pdf) +* [SLO Adoption and Usage in Site Reliability Engineering](https://www.oreilly.com/library/view/slo-adoption-and/9781492075370/) * [Practical Site Reliability Engineering](https://www.oreilly.com/library/view/practical-site-reliability/9781788839563/) * [Implementing Service Level Objectives](https://www.oreilly.com/library/view/implementing-service-level/9781492076803/) * [Chaos Engineering](https://www.oreilly.com/library/view/chaos-engineering/9781492043850/) @@ -1488,7 +1479,6 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [Awesome Chaos Engineering](https://github.com/dastergon/awesome-chaos-engineering) * [Awesome Monitoring](https://github.com/crazy-canux/awesome-monitoring) * [Awesome Observability](https://github.com/adriannovegil/awesome-observability) -* [Awesome Sysadmin](https://project-awesome.org/n1trux/awesome-sysadmin) * [Awesome MLOps](https://github.com/visenger/awesome-mlops) * [ML-Ops.org](https://ml-ops.org/) @@ -1505,7 +1495,6 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools #### Incidents & postmortems * [The Verica Open Incident Database](https://www.thevoid.community/) -* [Postmortem.io](https://postmortem.io/) * [Postmortem Templates](https://github.com/dastergon/postmortem-templates) * [Incident Review and Postmortem Best Practices](https://blog.pragmaticengineer.com/postmortem-best-practices/) From 8fe1cdc426e5ba17550bba8d7a4a2f0fc4e80360 Mon Sep 17 00:00:00 2001 From: wenzdey <56051809+wenzdey@users.noreply.github.com> Date: Sat, 1 Oct 2022 22:56:47 -0500 Subject: [PATCH 2/7] Replaced dead links with working ones --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d9e13b7..8d8735a 100644 --- a/README.md +++ b/README.md @@ -1438,7 +1438,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [Building Secure & Reliable Systems](https://www.oreilly.com/library/view/building-secure-and/9781492083115/) | [Read free online version hosted by Google](https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf) * [Site Reliability Engineering](https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/) | [Read free online version hosted by Google](https://sre.google/sre-book/table-of-contents/) * [The Site Reliability Workbook from Google](https://www.oreilly.com/library/view/the-site-reliability/9781492029496/) | [Read free online version hosted by Google](https://sre.google/workbook/table-of-contents/) -* [Training Site Reliability Engineers](https://www.oreilly.com/library/view/training-site-reliability/9781492076018/) +* [Training Site Reliability Engineers](https://www.oreilly.com/library/view/training-site-reliability/9781492076018/) | [Read free online version hosted by Google](https://github.com/google/googlesre/blob/main/publications/Training_Site_Reliability_Engineers.pdf) * [97 Things Every SRE Should Know](https://www.oreilly.com/library/view/97-things-every/9781492081487/) | [Complimentary Copy from Nginx](https://www.nginx.com/resources/library/97-things-every-sre-should-know/) * [SLO Adoption and Usage in Site Reliability Engineering](https://www.oreilly.com/library/view/slo-adoption-and/9781492075370/) * [Practical Site Reliability Engineering](https://www.oreilly.com/library/view/practical-site-reliability/9781788839563/) @@ -1491,6 +1491,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [School of SRE from LinkedIn](https://linkedin.github.io/school-of-sre/) * [Stripe Increment Magazine Issue 16 on Reliability](https://increment.com/reliability/) * [AWS Observability Recipes](https://aws-observability.github.io/aws-o11y-recipes/) +* [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) #### Incidents & postmortems From 5549459c792f79ed2338440cab22c6c2a7fffcbc Mon Sep 17 00:00:00 2001 From: Miss Stuck A Lot <114825131+miss-stuck-a-lot@users.noreply.github.com> Date: Sun, 2 Oct 2022 09:27:24 +0530 Subject: [PATCH 3/7] added IBM SRE blogs --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 093986f..8b6cd4c 100644 --- a/README.md +++ b/README.md @@ -612,6 +612,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools +
+ IBM + +### Blog Posts + +* [What is Site Reliability Engineering (SRE)?](https://www.ibm.com/cloud/learn/site-reliability-engineering) +*[AIOps tools and solutions](https://www.ibm.com/cloud/aiops) +
+
Indeed From 16af19471d73a942338bf7e11555124a96e5fb0e Mon Sep 17 00:00:00 2001 From: Miss Stuck A Lot <114825131+miss-stuck-a-lot@users.noreply.github.com> Date: Sun, 2 Oct 2022 09:47:05 +0530 Subject: [PATCH 4/7] fixed error --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 245cb96..7fa3c06 100644 --- a/README.md +++ b/README.md @@ -612,7 +612,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools ### Blog Posts * [What is Site Reliability Engineering (SRE)?](https://www.ibm.com/cloud/learn/site-reliability-engineering) -*[AIOps tools and solutions](https://www.ibm.com/cloud/aiops) +* [AIOps tools and solutions](https://www.ibm.com/cloud/aiops)
From 928c48cb38ba2b8539b41c8979c9292ae8d78d00 Mon Sep 17 00:00:00 2001 From: Miss Stuck A Lot <114825131+miss-stuck-a-lot@users.noreply.github.com> Date: Sun, 2 Oct 2022 09:50:59 +0530 Subject: [PATCH 5/7] updated readme, fixed another error --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7fa3c06..787e218 100644 --- a/README.md +++ b/README.md @@ -613,6 +613,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [What is Site Reliability Engineering (SRE)?](https://www.ibm.com/cloud/learn/site-reliability-engineering) * [AIOps tools and solutions](https://www.ibm.com/cloud/aiops) +
From 392a583e420b7a47487e601d7428e1636a7356f8 Mon Sep 17 00:00:00 2001 From: Dani Satria Date: Sun, 2 Oct 2022 15:15:49 +0700 Subject: [PATCH 6/7] feat: Added 2 more 2022's articles from Gojek Tech blog --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 787e218..56a4e8b 100644 --- a/README.md +++ b/README.md @@ -485,6 +485,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools ### Blog Posts +* [Introducing Skynet: Infrastructure as Code for Gojek](https://www.gojek.io/blog/introducing-skynet/) +* [Scaling Our Geo-Search Service For 10x Load](https://www.gojek.io/blog/scaling-our-geo-search-service-for-10x-load/) * [Why We Swear by the RCA](https://www.gojek.io/blog/why-we-swear-by-the-rca) * [How We Upgrade Kubernetes on GKE](https://blog.gojek.io/how-we-upgrade-kubernetes-on-gke/) * [How We Monitor Apache Airflow in Production](https://blog.gojek.io/how-we-monitor-apache-airflow-in-production/) From 825e0ec6561d5b5d2b371214a22e4496bdc7acab Mon Sep 17 00:00:00 2001 From: him2016 <39089904+him2016@users.noreply.github.com> Date: Wed, 5 Oct 2022 01:45:26 +0530 Subject: [PATCH 7/7] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 56a4e8b..db0867c 100644 --- a/README.md +++ b/README.md @@ -856,6 +856,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools * [FIT: Failure Injection Testing](https://netflixtechblog.com/fit-failure-injection-testing-35d8e2a9bb2) * [Announcing Security Monkey — AWS Security Configuration Monitoring and Analysis](https://netflixtechblog.com/announcing-security-monkey-aws-security-configuration-monitoring-and-analysis-1f2bfb001708) * [Lessons Netflix Learned from the AWS Outage](https://netflixtechblog.com/lessons-netflix-learned-from-the-aws-outage-deefe5fd0c04) +* [Scryer: Netflix’s Predictive Auto Scaling Engine](https://netflixtechblog.com/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270) ### Major incidents & analysis reports