Fix markdownlint errors

This commit is contained in:
Unmesh Gundecha
2021-02-21 00:05:37 +08:00
parent e1be246ff5
commit c7b301eb40
4 changed files with 178 additions and 95 deletions

2
.github/FUNDING.yml vendored
View File

@@ -1,2 +0,0 @@
# These are supported funding model platforms
custom: https://www.buymeacoffee.com/upgundecha

View File

@@ -1,5 +1,6 @@
{ {
"default": true, "default": true,
"line-length": false, "line-length": false,
"no-duplicate-header": false "no-duplicate-header": false,
"no-inline-html": false
} }

267
README.md
View File

@@ -1,12 +1,14 @@
# How they SRE # How they SRE
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square) ![Check Markdown links](https://github.com/upgundecha/howtheysre/workflows/Check%20Markdown%20links/badge.svg)
![Alt](banner.png "banner") ![Alt](banner.png "banner")
> A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE) > A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
## Introduction ## Introduction
__How They SRE__ is a curated knowledge repository of best practices, tools, techniques, and culture of SRE adopted by the leading technology or tech-savvy organizations. __How They SRE__ is a curated knowledge repository of best practices, tools, techniques, and culture of SRE adopted by the leading technology or tech-savvy organizations.
Many organizations regularly come forward and share their best practices, tools, techniques and offer an insight into engineering culture on various public platforms like engineering blogs, conferences & meetups. The content is curated from these avenues and shared in this repository. Many organizations regularly come forward and share their best practices, tools, techniques and offer an insight into engineering culture on various public platforms like engineering blogs, conferences & meetups. The content is curated from these avenues and shared in this repository.
@@ -21,7 +23,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* Monitoring & Observability * Monitoring & Observability
* Alerting * Alerting
* Incident Response & Post-Mortem * Incident Response & Post-Mortem
* On-Call * On-Call
* Testing in Production * Testing in Production
* Chaos Engineering * Chaos Engineering
* Automation * Automation
@@ -32,7 +34,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Airbnb</summary> <summary>Airbnb</summary>
#### Blog Posts ### Blog Posts
* [Detecting Vulnerabilities With Vulnture](https://medium.com/airbnb-engineering/detecting-vulnerabilities-with-vulnture-f5f23387f6ec) * [Detecting Vulnerabilities With Vulnture](https://medium.com/airbnb-engineering/detecting-vulnerabilities-with-vulnture-f5f23387f6ec)
* [Alerting Framework at Airbnb](https://medium.com/airbnb-engineering/alerting-framework-at-airbnb-35ba48df894f) * [Alerting Framework at Airbnb](https://medium.com/airbnb-engineering/alerting-framework-at-airbnb-35ba48df894f)
* [When The Cloud Gets Dark — How Amazons Outage Affected Airbnb](https://medium.com/airbnb-engineering/when-the-cloud-gets-dark-how-amazons-outage-affected-airbnb-66eaf8c0f162) * [When The Cloud Gets Dark — How Amazons Outage Affected Airbnb](https://medium.com/airbnb-engineering/when-the-cloud-gets-dark-how-amazons-outage-affected-airbnb-66eaf8c0f162)
@@ -42,7 +45,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Algolia</summary> <summary>Algolia</summary>
#### Blog Posts ### Blog Posts
* [May 30 SSL incident](https://www.algolia.com/blog/may-30-ssl-incident/) * [May 30 SSL incident](https://www.algolia.com/blog/may-30-ssl-incident/)
* [A Journey Into SRE](https://www.algolia.com/blog/a-journey-into-sre/) * [A Journey Into SRE](https://www.algolia.com/blog/a-journey-into-sre/)
@@ -51,16 +55,19 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Asana</summary> <summary>Asana</summary>
#### Blog Posts ### Blog Posts
* [How Asana ships stable web application releases](https://blog.asana.com/2021/01/asana-engineering-ships-web-application-releases/) * [How Asana ships stable web application releases](https://blog.asana.com/2021/01/asana-engineering-ships-web-application-releases/)
* [Analysis of recent downtime & what were doing to prevent future incidents](https://blog.asana.com/2019/09/downtime-what-were-doing-to-prevent-future-downtime/) * [Analysis of recent downtime & what were doing to prevent future incidents](https://blog.asana.com/2019/09/downtime-what-were-doing-to-prevent-future-downtime/)
* [Developer environment: Achieving reliability by making it fast to reset](https://blog.asana.com/2017/07/developer-environment-making-it-reliable-by-making-it-fast-to-reset/) * [Developer environment: Achieving reliability by making it fast to reset](https://blog.asana.com/2017/07/developer-environment-making-it-reliable-by-making-it-fast-to-reset/)
</details> </details>
<details> <details>
<summary>ASOS</summary> <summary>ASOS</summary>
#### Blog Posts ### Blog Posts
* [Cyber Security @ ASOS.com](https://medium.com/asos-techblog/cyber-security-asos-com-7d1d1f346e57) * [Cyber Security @ ASOS.com](https://medium.com/asos-techblog/cyber-security-asos-com-7d1d1f346e57)
* [Security Operations 24x7](https://medium.com/asos-techblog/security-operations-24-x-7-2e90c8e5e7e) * [Security Operations 24x7](https://medium.com/asos-techblog/security-operations-24-x-7-2e90c8e5e7e)
* [The skills we look for in Cyber Security Incident Response](https://medium.com/asos-techblog/the-skills-we-look-for-in-cyber-security-incident-response-12b327927e38) * [The skills we look for in Cyber Security Incident Response](https://medium.com/asos-techblog/the-skills-we-look-for-in-cyber-security-incident-response-12b327927e38)
@@ -70,7 +77,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Atlassian</summary> <summary>Atlassian</summary>
#### Blog Posts ### Blog Posts
* [Best practices for change management in the age of DevOps](https://www.atlassian.com/engineering/best-practices-for-change-management-in-the-age-of-devops) * [Best practices for change management in the age of DevOps](https://www.atlassian.com/engineering/best-practices-for-change-management-in-the-age-of-devops)
* [Automated testing: 5 lessons from Atlassians Kubernetes team on testing infrastructure as code](https://www.atlassian.com/engineering/automated-testing-5-lessons-from-atlassians-kubernetes-team-on-testing-infrastructure-as-code) * [Automated testing: 5 lessons from Atlassians Kubernetes team on testing infrastructure as code](https://www.atlassian.com/engineering/automated-testing-5-lessons-from-atlassians-kubernetes-team-on-testing-infrastructure-as-code)
* [How to export Kubernetes events for observability and alerting](https://www.atlassian.com/engineering/how-to-export-kubernetes-events-for-observability-and-alerting) * [How to export Kubernetes events for observability and alerting](https://www.atlassian.com/engineering/how-to-export-kubernetes-events-for-observability-and-alerting)
@@ -81,7 +89,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>BackMarket</summary> <summary>BackMarket</summary>
#### Blog Posts ### Blog Posts
* [How Back Market SREs prepared for Black Friday](https://medium.com/back-market-engineering/how-back-market-sres-prepared-for-black-friday-5f017f343408) * [How Back Market SREs prepared for Black Friday](https://medium.com/back-market-engineering/how-back-market-sres-prepared-for-black-friday-5f017f343408)
</details> </details>
@@ -89,7 +98,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Baidu</summary> <summary>Baidu</summary>
#### Videos ### Videos
* [Anomaly Detection on Golden Signals](https://www.usenix.org/conference/srecon19asia/presentation/chen-yu) * [Anomaly Detection on Golden Signals](https://www.usenix.org/conference/srecon19asia/presentation/chen-yu)
* [NetRadar: Monitoring the Datacenter Network](https://www.usenix.org/conference/srecon19asia/presentation/chen-yun) * [NetRadar: Monitoring the Datacenter Network](https://www.usenix.org/conference/srecon19asia/presentation/chen-yun)
@@ -98,13 +108,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Basecamp</summary> <summary>Basecamp</summary>
#### Blog Posts ### Blog Posts
* [Inside a CODE RED: Network Edition](https://m.signalvnoise.com/inside-a-code-red-network-edition/) * [Inside a CODE RED: Network Edition](https://m.signalvnoise.com/inside-a-code-red-network-edition/)
* [Three Basecamp outages. One week. What happened?](https://m.signalvnoise.com/three-basecamp-outages-one-week-what-happened/) * [Three Basecamp outages. One week. What happened?](https://m.signalvnoise.com/three-basecamp-outages-one-week-what-happened/)
* [Basecamp 2 and Basecamp 3 search outage report](https://m.signalvnoise.com/basecamp-2-and-basecamp-3-search-outage-report/) * [Basecamp 2 and Basecamp 3 search outage report](https://m.signalvnoise.com/basecamp-2-and-basecamp-3-search-outage-report/)
* [Reducing Incident Escalations at Basecamp](https://m.signalvnoise.com/reducing-incident-escalations-at-basecamp/) * [Reducing Incident Escalations at Basecamp](https://m.signalvnoise.com/reducing-incident-escalations-at-basecamp/)
#### Books ### Books
* [Shape Up](https://basecamp.com/shapeup/webbook) * [Shape Up](https://basecamp.com/shapeup/webbook)
</details> </details>
@@ -112,31 +124,37 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Bloomberg</summary> <summary>Bloomberg</summary>
#### Videos ### Videos
* [Capacity Planning and Performance Enhancement with Page Reference Sampling](https://www.usenix.org/conference/srecon20americas/presentation/chen) * [Capacity Planning and Performance Enhancement with Page Reference Sampling](https://www.usenix.org/conference/srecon20americas/presentation/chen)
* [Why SREs can't afford to NOT do Chaos Engineering](https://www.usenix.org/conference/srecon20americas/presentation/pawlikowski) * [Why SREs can't afford to NOT do Chaos Engineering](https://www.usenix.org/conference/srecon20americas/presentation/pawlikowski)
* [Tracing Real-Time Distributed Systems](https://www.usenix.org/conference/srecon19emea/presentation/yakimov) * [Tracing Real-Time Distributed Systems](https://www.usenix.org/conference/srecon19emea/presentation/yakimov)
* [The Bloomberg Story: Building SRE Teams in an "Immeasurable" Organisation](https://www.usenix.org/conference/srecon19asia/presentation/sorensen) * [The Bloomberg Story: Building SRE Teams in an "Immeasurable" Organisation](https://www.usenix.org/conference/srecon19asia/presentation/sorensen)
* [Visibility into Loggers (and Other Low Level Services)—Seeing the Trees from the Forest](https://www.usenix.org/conference/srecon19americas/presentation/chen) * [Visibility into Loggers (and Other Low Level Services)—Seeing the Trees from the Forest](https://www.usenix.org/conference/srecon19americas/presentation/chen)
</details> </details>
<details> <details>
<summary>Booking.com</summary> <summary>Booking.com</summary>
#### Blog Posts ### Blog Posts
* [How Reliability and Product Teams Collaborate at Booking.com](https://medium.com/booking-com-infrastructure/how-reliability-and-product-teams-collaborate-at-booking-com-f6c317cc0aeb) * [How Reliability and Product Teams Collaborate at Booking.com](https://medium.com/booking-com-infrastructure/how-reliability-and-product-teams-collaborate-at-booking-com-f6c317cc0aeb)
* [Incidents, fixes, and the day after](https://medium.com/booking-com-infrastructure/incidents-fixes-and-the-day-after-c5d9aeae28c3) * [Incidents, fixes, and the day after](https://medium.com/booking-com-infrastructure/incidents-fixes-and-the-day-after-c5d9aeae28c3)
* [Troubleshooting: A journey into the unknown](https://medium.com/booking-com-infrastructure/troubleshooting-a-journey-into-the-unknown-e31b524fa86) * [Troubleshooting: A journey into the unknown](https://medium.com/booking-com-infrastructure/troubleshooting-a-journey-into-the-unknown-e31b524fa86)
#### Videos ### Videos
* [SLOs for Data-Intensive Services](https://www.usenix.org/conference/srecon19emea/presentation/fouquet) * [SLOs for Data-Intensive Services](https://www.usenix.org/conference/srecon19emea/presentation/fouquet)
* [Benefits of Taking the Less Traveled Road with Containers Infrastructure](https://www.usenix.org/conference/srecon19americas/presentation/iacoboaia) * [Benefits of Taking the Less Traveled Road with Containers Infrastructure](https://www.usenix.org/conference/srecon19americas/presentation/iacoboaia)
</details> </details>
<details> <details>
<summary>Capital One</summary> <summary>Capital One</summary>
#### Blog Posts ### Blog Posts
* [Automate AWS Infrastructure with Boto 3: AWS Health Check](https://medium.com/capital-one-tech/automate-aws-infrastructure-with-boto-3-aws-health-checks-e51338ba075) * [Automate AWS Infrastructure with Boto 3: AWS Health Check](https://medium.com/capital-one-tech/automate-aws-infrastructure-with-boto-3-aws-health-checks-e51338ba075)
* [Active-Active Shared-Nothing Database Architecture](https://medium.com/capital-one-tech/active-active-shared-nothing-database-architecture-304957ffb89) * [Active-Active Shared-Nothing Database Architecture](https://medium.com/capital-one-tech/active-active-shared-nothing-database-architecture-304957ffb89)
* [The 3 Rs of SREs: Resiliency, Recovery & Reliability](https://medium.com/capital-one-tech/the-3-rs-of-sres-resiliency-recovery-reliability-5f2f5360a91b) * [The 3 Rs of SREs: Resiliency, Recovery & Reliability](https://medium.com/capital-one-tech/the-3-rs-of-sres-resiliency-recovery-reliability-5f2f5360a91b)
@@ -153,11 +171,13 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Continuous Chaos — Introducing Chaos Engineering into DevOps Practices](https://medium.com/capital-one-tech/continuous-chaos-introducing-chaos-engineering-into-devops-practices-75757e1cca6d) * [Continuous Chaos — Introducing Chaos Engineering into DevOps Practices](https://medium.com/capital-one-tech/continuous-chaos-introducing-chaos-engineering-into-devops-practices-75757e1cca6d)
* [The Mon-ifesto Part 1: Metrics](https://medium.com/capital-one-tech/the-mon-ifesto-part-1-metrics-808f6c944765) * [The Mon-ifesto Part 1: Metrics](https://medium.com/capital-one-tech/the-mon-ifesto-part-1-metrics-808f6c944765)
#### Major incidents & analysis reports ### Major incidents & analysis reports
* [Information on the Capital One Cyber Incident](https://www.capitalone.com/facts2019/) * [Information on the Capital One Cyber Incident](https://www.capitalone.com/facts2019/)
* [A Case Study of the Capital One Data Breach](http://web.mit.edu/smadnick/www/wp/2020-16.pdf) * [A Case Study of the Capital One Data Breach](http://web.mit.edu/smadnick/www/wp/2020-16.pdf)
#### Videos ### Videos
* [Banking on Continuous Delivery - Capital One](https://www.youtube.com/watch?v=_DnYSQEUTfo) * [Banking on Continuous Delivery - Capital One](https://www.youtube.com/watch?v=_DnYSQEUTfo)
* [Continuous Chaos in DevOps - Capital One](https://www.youtube.com/watch?v=U_Uh5RMCwPI) * [Continuous Chaos in DevOps - Capital One](https://www.youtube.com/watch?v=U_Uh5RMCwPI)
* [DevOps at Capital One: Focusing on Pipeline and Measurement](https://www.youtube.com/watch?v=6Q0mtVnnthQ) * [DevOps at Capital One: Focusing on Pipeline and Measurement](https://www.youtube.com/watch?v=6Q0mtVnnthQ)
@@ -168,11 +188,13 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>DBS</summary> <summary>DBS</summary>
#### Blog Posts ### Blog Posts
* [Site Reliability Engineering at DBS Bank](https://medium.com/dbs-tech-blog/site-reliability-engineering-at-dbs-bank-32c02228ccf4) * [Site Reliability Engineering at DBS Bank](https://medium.com/dbs-tech-blog/site-reliability-engineering-at-dbs-bank-32c02228ccf4)
* [Automating Configuration Management at Scale](https://medium.com/dbs-tech-blog/automating-configuration-management-at-scale-5c7927f83df3) * [Automating Configuration Management at Scale](https://medium.com/dbs-tech-blog/automating-configuration-management-at-scale-5c7927f83df3)
#### Videos ### Videos
* [SREcon Conversations Asia/Pacific with Koon Seng Lim, DBS](https://www.youtube.com/watch?v=URwkaRbOLxI&feature=emb_title) * [SREcon Conversations Asia/Pacific with Koon Seng Lim, DBS](https://www.youtube.com/watch?v=URwkaRbOLxI&feature=emb_title)
</details> </details>
@@ -180,7 +202,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>DeepSource</summary> <summary>DeepSource</summary>
#### Blog Posts ### Blog Posts
* [Redis diskless replication: What, how, why and the caveats](https://deepsource.io/blog/redis-diskless-replication/) * [Redis diskless replication: What, how, why and the caveats](https://deepsource.io/blog/redis-diskless-replication/)
* [How to setup Vault with Kubernetes](https://deepsource.io/blog/setup-vault-kubernetes/) * [How to setup Vault with Kubernetes](https://deepsource.io/blog/setup-vault-kubernetes/)
* [Breaking down zero downtime deployments in Kubernetes](https://deepsource.io/blog/zero-downtime-deployment/) * [Breaking down zero downtime deployments in Kubernetes](https://deepsource.io/blog/zero-downtime-deployment/)
@@ -190,11 +213,13 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Dropbox</summary> <summary>Dropbox</summary>
#### Blog Posts ### Blog Posts
* [Monitoring server applications with Vortex](https://dropbox.tech/infrastructure/monitoring-server-applications-with-vortex) * [Monitoring server applications with Vortex](https://dropbox.tech/infrastructure/monitoring-server-applications-with-vortex)
* [Athena: Our automated build health management system](https://dropbox.tech/infrastructure/athena-our-automated-build-health-management-system) * [Athena: Our automated build health management system](https://dropbox.tech/infrastructure/athena-our-automated-build-health-management-system)
#### Videos ### Videos
* [Service Discovery Challenges at Scale](https://www.usenix.org/conference/srecon19americas/presentation/nigmatullin) * [Service Discovery Challenges at Scale](https://www.usenix.org/conference/srecon19americas/presentation/nigmatullin)
</details> </details>
@@ -202,7 +227,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Facebook</summary> <summary>Facebook</summary>
#### Videos ### Videos
* [A Customer Service Approach to SRE](https://www.usenix.org/conference/srecon19emea/presentation/looney) * [A Customer Service Approach to SRE](https://www.usenix.org/conference/srecon19emea/presentation/looney)
* [How (Not) to Scale a Project: A Post-Mortem](https://www.usenix.org/conference/srecon19asia/presentation/bagnoli) * [How (Not) to Scale a Project: A Post-Mortem](https://www.usenix.org/conference/srecon19asia/presentation/bagnoli)
* [Releasing the World's Largest Python Site Every 7 Minutes](https://www.usenix.org/conference/srecon19asia/presentation/wong-shuhong) * [Releasing the World's Largest Python Site Every 7 Minutes](https://www.usenix.org/conference/srecon19asia/presentation/wong-shuhong)
@@ -213,7 +239,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Fastly</summary> <summary>Fastly</summary>
#### Videos ### Videos
* [SRE & Product Management: How to Level up Your Team (and Career!) by Thinking like a Product Manager](https://www.usenix.org/conference/srecon19americas/presentation/wohlner) * [SRE & Product Management: How to Level up Your Team (and Career!) by Thinking like a Product Manager](https://www.usenix.org/conference/srecon19americas/presentation/wohlner)
* [Resilience Engineering Mythbusting](https://www.usenix.org/conference/srecon19americas/presentation/gallego) * [Resilience Engineering Mythbusting](https://www.usenix.org/conference/srecon19americas/presentation/gallego)
@@ -222,13 +249,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>eBay</summary> <summary>eBay</summary>
#### Blog Posts ### Blog Posts
* [Resiliency and Disaster Recovery with Kafka](https://tech.ebayinc.com/engineering/resiliency-and-disaster-recovery-with-kafka/) * [Resiliency and Disaster Recovery with Kafka](https://tech.ebayinc.com/engineering/resiliency-and-disaster-recovery-with-kafka/)
* [SRE Case Study: Triaging a Non-Heap JVM Out of Memory Issue](https://tech.ebayinc.com/engineering/sre-case-study-triage-a-non-heap-jvm-out-of-memory-issue/) * [SRE Case Study: Triaging a Non-Heap JVM Out of Memory Issue](https://tech.ebayinc.com/engineering/sre-case-study-triage-a-non-heap-jvm-out-of-memory-issue/)
* [SRE Case Study: Mysterious Traffic Imbalance](https://tech.ebayinc.com/engineering/sre-case-study-mysterious-traffic-imbalance/) * [SRE Case Study: Mysterious Traffic Imbalance](https://tech.ebayinc.com/engineering/sre-case-study-mysterious-traffic-imbalance/)
* [Zero Downtime, Instant Deployment and Rollback](https://tech.ebayinc.com/engineering/zero-downtime-instant-deployment-and-rollback/) * [Zero Downtime, Instant Deployment and Rollback](https://tech.ebayinc.com/engineering/zero-downtime-instant-deployment-and-rollback/)
### Video ### Video
* [Madaari: Ordering for the Monkeys](https://www.usenix.org/conference/srecon19americas/presentation/raina) * [Madaari: Ordering for the Monkeys](https://www.usenix.org/conference/srecon19americas/presentation/raina)
</details> </details>
@@ -236,14 +265,16 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Etsy</summary> <summary>Etsy</summary>
#### Blog Posts ### Blog Posts
* [Etsys Debriefing Facilitation Guide for Blameless Postmortems](https://codeascraft.com/2016/11/17/debriefing-facilitation-guide/) * [Etsys Debriefing Facilitation Guide for Blameless Postmortems](https://codeascraft.com/2016/11/17/debriefing-facilitation-guide/)
* [Opsweekly: Measuring on-call experience with alert classification](https://codeascraft.com/2014/06/19/opsweekly-measuring-on-call-experience-with-alert-classification/) * [Opsweekly: Measuring on-call experience with alert classification](https://codeascraft.com/2014/06/19/opsweekly-measuring-on-call-experience-with-alert-classification/)
* [Demystifying Site Outages](https://blog.etsy.com/news/2012/demystifying-site-outages/) * [Demystifying Site Outages](https://blog.etsy.com/news/2012/demystifying-site-outages/)
* [Blameless PostMortems and a Just Culture](https://codeascraft.com/2012/05/22/blameless-postmortems/) * [Blameless PostMortems and a Just Culture](https://codeascraft.com/2012/05/22/blameless-postmortems/)
* [Measure Anything, Measure Everything](https://codeascraft.com/2011/02/15/measure-anything-measure-everything/) * [Measure Anything, Measure Everything](https://codeascraft.com/2011/02/15/measure-anything-measure-everything/)
#### Videos ### Videos
* [Velocity 09: John Allspaw and Paul Hammond, "10+ Deploys Pe](https://www.youtube.com/watch?v=LdOe18KhtT4) * [Velocity 09: John Allspaw and Paul Hammond, "10+ Deploys Pe](https://www.youtube.com/watch?v=LdOe18KhtT4)
* [Migrating a Monolith to the Cloud](https://www.usenix.org/conference/srecon19americas/presentation/govande) * [Migrating a Monolith to the Cloud](https://www.usenix.org/conference/srecon19americas/presentation/govande)
@@ -252,7 +283,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Expedia</summary> <summary>Expedia</summary>
#### Blog Posts ### Blog Posts
* [The Cost of 100% Reliability](https://medium.com/expedia-group-tech/the-cost-of-100-reliability-ecb2901f23a4) * [The Cost of 100% Reliability](https://medium.com/expedia-group-tech/the-cost-of-100-reliability-ecb2901f23a4)
* [Creating Monitoring Dashboards](https://medium.com/expedia-group-tech/creating-monitoring-dashboards-1f3fbe0ae1ac) * [Creating Monitoring Dashboards](https://medium.com/expedia-group-tech/creating-monitoring-dashboards-1f3fbe0ae1ac)
* [Using Bash for DevOps](https://medium.com/expedia-group-tech/using-bash-for-devops-7046eed1aa63) * [Using Bash for DevOps](https://medium.com/expedia-group-tech/using-bash-for-devops-7046eed1aa63)
@@ -262,7 +294,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>GitHub</summary> <summary>GitHub</summary>
#### Blog Posts ### Blog Posts
* [Deployment reliability at GitHub](https://github.blog/2021-02-03-deployment-reliability-at-github/) * [Deployment reliability at GitHub](https://github.blog/2021-02-03-deployment-reliability-at-github/)
* [Improving how we deploy GitHub](https://github.blog/2021-01-25-improving-how-we-deploy-github/) * [Improving how we deploy GitHub](https://github.blog/2021-01-25-improving-how-we-deploy-github/)
* [Building On-Call Culture at GitHub](https://github.blog/2021-01-06-building-on-call-culture-at-github/) * [Building On-Call Culture at GitHub](https://github.blog/2021-01-06-building-on-call-culture-at-github/)
@@ -271,7 +304,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Getting started with DevOps automation](https://github.blog/2020-10-29-getting-started-with-devops-automation/) * [Getting started with DevOps automation](https://github.blog/2020-10-29-getting-started-with-devops-automation/)
* [MySQL High Availability at GitHub](https://github.blog/2018-06-20-mysql-high-availability-at-github/) * [MySQL High Availability at GitHub](https://github.blog/2018-06-20-mysql-high-availability-at-github/)
#### Major incidents & analysis reports ### Major incidents & analysis reports
* [GitHub Availability Report: January 2021](https://github.blog/2021-02-02-github-availability-report-january-2021/) * [GitHub Availability Report: January 2021](https://github.blog/2021-02-02-github-availability-report-january-2021/)
* [GitHub Availability Report: December 2020](https://github.blog/2021-01-06-github-availability-report-december-2020/) * [GitHub Availability Report: December 2020](https://github.blog/2021-01-06-github-availability-report-december-2020/)
* [GitHub Availability Report: November 2020](https://github.blog/2020-12-02-availability-report-november-2020/) * [GitHub Availability Report: November 2020](https://github.blog/2020-12-02-availability-report-november-2020/)
@@ -283,7 +317,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [February 28th DDoS Incident Report](https://github.blog/2018-03-01-ddos-incident-report/) * [February 28th DDoS Incident Report](https://github.blog/2018-03-01-ddos-incident-report/)
* [Incident Report: Inadvertent Private Repository Disclosure](https://github.blog/2016-10-28-incident-report-inadvertent-private-repository-disclosure/) * [Incident Report: Inadvertent Private Repository Disclosure](https://github.blog/2016-10-28-incident-report-inadvertent-private-repository-disclosure/)
#### Videos ### Videos
* [One on One SRE](https://www.usenix.org/conference/srecon19americas/presentation/tobey) * [One on One SRE](https://www.usenix.org/conference/srecon19americas/presentation/tobey)
</details> </details>
@@ -291,7 +326,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>GitLab</summary> <summary>GitLab</summary>
#### Blog Posts ### Blog Posts
* [This SRE attempted to roll out an HAProxy config change. You won't believe what happened next...](https://about.gitlab.com/blog/2021/01/14/this-sre-attempted-to-roll-out-an-haproxy-change/) * [This SRE attempted to roll out an HAProxy config change. You won't believe what happened next...](https://about.gitlab.com/blog/2021/01/14/this-sre-attempted-to-roll-out-an-haproxy-change/)
* [My week shadowing a GitLab Site Reliability Engineer](https://about.gitlab.com/blog/2019/12/16/sre-shadow/) * [My week shadowing a GitLab Site Reliability Engineer](https://about.gitlab.com/blog/2019/12/16/sre-shadow/)
* [Update: Elasticsearch lessons learnt for Advanced Global Search](https://about.gitlab.com/blog/2020/04/28/elasticsearch-update/) * [Update: Elasticsearch lessons learnt for Advanced Global Search](https://about.gitlab.com/blog/2020/04/28/elasticsearch-update/)
@@ -307,7 +343,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>GoCardless</summary> <summary>GoCardless</summary>
#### Blog Posts ### Blog Posts
* [Deploying Software at GoCardless: Open-Sourcing our “Getting Started” Tutorial](https://medium.com/gocardless-tech/deploying-software-at-gocardless-open-sourcing-our-getting-started-tutorial-ab857aa91c9e) * [Deploying Software at GoCardless: Open-Sourcing our “Getting Started” Tutorial](https://medium.com/gocardless-tech/deploying-software-at-gocardless-open-sourcing-our-getting-started-tutorial-ab857aa91c9e)
* [How we compress Pub/Sub messages and more, saving a load of money](https://medium.com/gocardless-tech/how-we-compress-pub-sub-messages-and-more-saving-a-load-of-money-694b64c3458a) * [How we compress Pub/Sub messages and more, saving a load of money](https://medium.com/gocardless-tech/how-we-compress-pub-sub-messages-and-more-saving-a-load-of-money-694b64c3458a)
* [Fear-free PostgreSQL migrations for Rails](https://gocardless.com/blog/fear-free-postgresql-migrations-for-rails/) * [Fear-free PostgreSQL migrations for Rails](https://gocardless.com/blog/fear-free-postgresql-migrations-for-rails/)
@@ -316,7 +353,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Zero-downtime Postgres migrations - the hard parts](https://gocardless.com/blog/zero-downtime-postgres-migrations-the-hard-parts/) * [Zero-downtime Postgres migrations - the hard parts](https://gocardless.com/blog/zero-downtime-postgres-migrations-the-hard-parts/)
* [In search of performance - how we shaved 200ms off every POST request](https://gocardless.com/blog/in-search-of-performance-how-we-shaved-200ms-off-every-post-request/) * [In search of performance - how we shaved 200ms off every POST request](https://gocardless.com/blog/in-search-of-performance-how-we-shaved-200ms-off-every-post-request/)
#### Major incidents & analysis reports ### Major incidents & analysis reports
* [Incident review: Service outage on 25 October 2020, Vault TLS expiry](https://gocardless.com/blog/incident-review-service-outage-on-25-october-2020/) * [Incident review: Service outage on 25 October 2020, Vault TLS expiry](https://gocardless.com/blog/incident-review-service-outage-on-25-october-2020/)
* [Incident review: API and Dashboard outage on 10 October 2017](https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/) * [Incident review: API and Dashboard outage on 10 October 2017](https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/)
@@ -325,18 +363,21 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Google</summary> <summary>Google</summary>
#### Blog Posts ### Blog Posts
* [SRE Practices & Processes](https://sre.google/resources/#practicesandprocesses) * [SRE Practices & Processes](https://sre.google/resources/#practicesandprocesses)
* [Three months, 30x demand: How we scaled Google Meet during COVID-19](https://cloud.google.com/blog/products/g-suite/keeping-google-meet-ahead-of-usage-demand-during-covid-19) * [Three months, 30x demand: How we scaled Google Meet during COVID-19](https://cloud.google.com/blog/products/g-suite/keeping-google-meet-ahead-of-usage-demand-during-covid-19)
* [SRE Classroom: Distributed PubSub](https://sre.google/resources/practices-and-processes/distributed-pubsub/) * [SRE Classroom: Distributed PubSub](https://sre.google/resources/practices-and-processes/distributed-pubsub/)
#### Books ### Books
* [Building Secure & Reliable Systems](https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf) * [Building Secure & Reliable Systems](https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf)
* [Site Reliability Engineering](https://sre.google/sre-book/table-of-contents/) * [Site Reliability Engineering](https://sre.google/sre-book/table-of-contents/)
* [The Site Reliability Workbook](https://sre.google/workbook/table-of-contents/) * [The Site Reliability Workbook](https://sre.google/workbook/table-of-contents/)
* [Training Site Reliability Engineers](https://static.googleusercontent.com/media/sre.google/en//static/pdf/training-sre.pdf) * [Training Site Reliability Engineers](https://static.googleusercontent.com/media/sre.google/en//static/pdf/training-sre.pdf)
#### Videos ### Videos
* [What's the Difference Between DevOps and SRE? with Seth Vargo and Liz Fong-Jones of Google](https://youtu.be/uTEL8Ff1Zvk) * [What's the Difference Between DevOps and SRE? with Seth Vargo and Liz Fong-Jones of Google](https://youtu.be/uTEL8Ff1Zvk)
* [Risk and Error Budgets with Seth Vargo and Liz Fong-Jones of Google](https://youtu.be/y2ILKr8kCJU) * [Risk and Error Budgets with Seth Vargo and Liz Fong-Jones of Google](https://youtu.be/y2ILKr8kCJU)
* [Pragmatic Automation with Max Luebbe of GCP](https://www.youtube.com/watch?v=oDcjAcFTFC0&t=0m56s) * [Pragmatic Automation with Max Luebbe of GCP](https://www.youtube.com/watch?v=oDcjAcFTFC0&t=0m56s)
@@ -370,7 +411,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Gojek</summary> <summary>Gojek</summary>
#### Blog Posts ### Blog Posts
* [Why We Swear by the RCA](https://blog.gojekengineering.com/why-we-swear-by-the-rca-f535fd5abbcb) * [Why We Swear by the RCA](https://blog.gojekengineering.com/why-we-swear-by-the-rca-f535fd5abbcb)
</details> </details>
@@ -378,7 +420,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Grab</summary> <summary>Grab</summary>
#### Blog Posts ### Blog Posts
* [Our Journey to Continuous Delivery at Grab (Part 1)](https://engineering.grab.com/our-journey-to-continuous-delivery-at-grab) * [Our Journey to Continuous Delivery at Grab (Part 1)](https://engineering.grab.com/our-journey-to-continuous-delivery-at-grab)
* [Designing Resilient Systems: Circuit Breakers or Retries? (Part 1)](https://engineering.grab.com/designing-resilient-systems-part-1) * [Designing Resilient Systems: Circuit Breakers or Retries? (Part 1)](https://engineering.grab.com/designing-resilient-systems-part-1)
* [Designing Resilient Systems: Circuit Breakers or Retries? (Part 2)](https://engineering.grab.com/designing-resilient-systems-part-2) * [Designing Resilient Systems: Circuit Breakers or Retries? (Part 2)](https://engineering.grab.com/designing-resilient-systems-part-2)
@@ -392,7 +435,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Grammarly</summary> <summary>Grammarly</summary>
#### Blog Posts ### Blog Posts
* [Security Operations in an AWS Environment](https://www.grammarly.com/blog/engineering/security-infrastructure-aws/) * [Security Operations in an AWS Environment](https://www.grammarly.com/blog/engineering/security-infrastructure-aws/)
</details> </details>
@@ -400,7 +444,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Heroku</summary> <summary>Heroku</summary>
#### Blog Posts ### Blog Posts
* [Incident Response at Heroku](https://blog.heroku.com/incident-response-at-heroku-2020) * [Incident Response at Heroku](https://blog.heroku.com/incident-response-at-heroku-2020)
</details> </details>
@@ -408,12 +453,14 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Indeed</summary> <summary>Indeed</summary>
#### Blog Posts ### Blog Posts
* [Being Just Reliable Enough](https://engineering.indeedblog.com/blog/2019/10/being-just-reliable-enough/) * [Being Just Reliable Enough](https://engineering.indeedblog.com/blog/2019/10/being-just-reliable-enough/)
* [Automating Indeeds Release Process](https://engineering.indeedblog.com/blog/2017/03/automating-release-process/) * [Automating Indeeds Release Process](https://engineering.indeedblog.com/blog/2017/03/automating-release-process/)
* [Sloth, a Tool for Inducing Network Failures with Preetha Appan of Indeed.com](https://www.usenix.org/conference/srecon17americas/program/presentation/appan) * [Sloth, a Tool for Inducing Network Failures with Preetha Appan of Indeed.com](https://www.usenix.org/conference/srecon17americas/program/presentation/appan)
#### Videos ### Videos
* [Are We Getting Better Yet? Progress Toward Safer Operations](https://www.usenix.org/conference/srecon20americas/presentation/elman) * [Are We Getting Better Yet? Progress Toward Safer Operations](https://www.usenix.org/conference/srecon20americas/presentation/elman)
</details> </details>
@@ -421,7 +468,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Khan Academy</summary> <summary>Khan Academy</summary>
#### Blog Posts ### Blog Posts
* [How Khan Academy Successfully Handled 2.5x Traffic in a Week](https://blog.khanacademy.org/how-khan-academy-successfully-handled-2-5x-traffic-in-a-week/) * [How Khan Academy Successfully Handled 2.5x Traffic in a Week](https://blog.khanacademy.org/how-khan-academy-successfully-handled-2-5x-traffic-in-a-week/)
* [Evolving our content infrastructure](https://blog.khanacademy.org/evolving-our-content-infrastructure/) * [Evolving our content infrastructure](https://blog.khanacademy.org/evolving-our-content-infrastructure/)
@@ -430,7 +478,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>LinkedIn</summary> <summary>LinkedIn</summary>
#### Blog Posts ### Blog Posts
* [Insights into a Product SRE team at LinkedIn](https://www.linkedin.com/pulse/insights-product-sre-team-linkedin-zaina-afoulki/?trackingId=mxKJgZ3kp8l2WI9D4UZv7Q%3D%3D) * [Insights into a Product SRE team at LinkedIn](https://www.linkedin.com/pulse/insights-product-sre-team-linkedin-zaina-afoulki/?trackingId=mxKJgZ3kp8l2WI9D4UZv7Q%3D%3D)
* [Open source update: School of SRE](https://engineering.linkedin.com/blog/2021/open-source-update--school-of-sre) * [Open source update: School of SRE](https://engineering.linkedin.com/blog/2021/open-source-update--school-of-sre)
* [Fixing Linux filesystem performance regressions](https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions) * [Fixing Linux filesystem performance regressions](https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions)
@@ -452,7 +501,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [What Gets Measured Gets Fixed](https://engineering.linkedin.com/blog/2016/12/what-gets-measured-gets-fixed) * [What Gets Measured Gets Fixed](https://engineering.linkedin.com/blog/2016/12/what-gets-measured-gets-fixed)
* [Hiring SREs at LinkedIn](https://engineering.linkedin.com/engineering-culture/hiring-sres-linkedin) * [Hiring SREs at LinkedIn](https://engineering.linkedin.com/engineering-culture/hiring-sres-linkedin)
#### Videos ### Videos
* [Growing the Site Reliability Team at LinkedIn: Hiring is Hard -- Greg Leffler](https://www.youtube.com/watch?v=ZemNg9GYvOA) * [Growing the Site Reliability Team at LinkedIn: Hiring is Hard -- Greg Leffler](https://www.youtube.com/watch?v=ZemNg9GYvOA)
* [9 Years of Failure: How Racing Crappy Cars Made Me a Better SRE](https://www.usenix.org/conference/srecon20americas/presentation/doherty) * [9 Years of Failure: How Racing Crappy Cars Made Me a Better SRE](https://www.usenix.org/conference/srecon20americas/presentation/doherty)
* [Weathering the Storm: How Early Warnings Save the Farm](https://www.usenix.org/conference/srecon19emea/presentation/sherwin) * [Weathering the Storm: How Early Warnings Save the Farm](https://www.usenix.org/conference/srecon19emea/presentation/sherwin)
@@ -472,17 +522,19 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Mercari</summary> <summary>Mercari</summary>
## Mercari ### Blog Posts
#### Blog Posts
* [DevSecOps: What Is It and Why Is It Gaining Momentum in the Industry?](https://engineering.mercari.com/en/blog/entry/20201214-devsecops-what-is-it-and-why-is-it-gaining-momentum-in-the-industry/) * [DevSecOps: What Is It and Why Is It Gaining Momentum in the Industry?](https://engineering.mercari.com/en/blog/entry/20201214-devsecops-what-is-it-and-why-is-it-gaining-momentum-in-the-industry/)
* [How do we share troubleshooting skills](https://engineering.mercari.com/en/blog/entry/2020-01-28-143339/) * [How do we share troubleshooting skills](https://engineering.mercari.com/en/blog/entry/2020-01-28-143339/)
* [Datadog Dashboard at Scale w / Terraform](https://engineering.mercari.com/en/blog/entry/2019-12-09-122134/) * [Datadog Dashboard at Scale w / Terraform](https://engineering.mercari.com/en/blog/entry/2019-12-09-122134/)
</details> </details>
<details> <details>
<summary>Microsoft</summary> <summary>Microsoft</summary>
#### Videos ### Videos
* [SLI & Reliability Deep-Dive with David N. Blank-Edelman of Microsoft](https://www.youtube.com/watch?v=1iMo3SkdQqQ) * [SLI & Reliability Deep-Dive with David N. Blank-Edelman of Microsoft](https://www.youtube.com/watch?v=1iMo3SkdQqQ)
* [Ironies of Automation: A Comedy in Three Parts with Tanner Lund of Microsoft](https://www.youtube.com/watch?v=U3ubcoNzx9k) * [Ironies of Automation: A Comedy in Three Parts with Tanner Lund of Microsoft](https://www.youtube.com/watch?v=U3ubcoNzx9k)
* [Sustainable Software Engineering & SREs](https://www.usenix.org/conference/srecon20americas/presentation/johnson) * [Sustainable Software Engineering & SREs](https://www.usenix.org/conference/srecon20americas/presentation/johnson)
@@ -493,13 +545,14 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Availability—Thinking beyond 9s](https://www.usenix.org/conference/srecon19asia/presentation/srinivasamurthy) * [Availability—Thinking beyond 9s](https://www.usenix.org/conference/srecon19asia/presentation/srinivasamurthy)
* [Ironies of Automation: A Comedy in Three Parts](https://www.usenix.org/conference/srecon19asia/presentation/lund-comedy) * [Ironies of Automation: A Comedy in Three Parts](https://www.usenix.org/conference/srecon19asia/presentation/lund-comedy)
* [The Ops in Serverless](https://www.usenix.org/conference/srecon19americas/presentation/davis) * [The Ops in Serverless](https://www.usenix.org/conference/srecon19americas/presentation/davis)
</details> </details>
<details> <details>
<summary>MIRO</summary> <summary>MIRO</summary>
## MIRO ### Blog Posts
#### Blog Posts
* [Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics](https://medium.com/miro-engineering/prometheus-high-availability-and-fault-tolerance-strategy-long-term-storage-with-victoriametrics-82f6f3f0409e) * [Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics](https://medium.com/miro-engineering/prometheus-high-availability-and-fault-tolerance-strategy-long-term-storage-with-victoriametrics-82f6f3f0409e)
* [Managing hundreds of servers for load testing: Autoscaling, custom monitoring, DevOps culture](https://medium.com/miro-engineering/managing-hundreds-of-servers-for-load-testing-autoscaling-custom-monitoring-devops-culture-390fd1c7e699) * [Managing hundreds of servers for load testing: Autoscaling, custom monitoring, DevOps culture](https://medium.com/miro-engineering/managing-hundreds-of-servers-for-load-testing-autoscaling-custom-monitoring-devops-culture-390fd1c7e699)
* [Reliable load testing with regards to unexpected nuances](https://medium.com/miro-engineering/reliable-load-testing-with-regards-to-unexpected-nuances-6f38c82196a5) * [Reliable load testing with regards to unexpected nuances](https://medium.com/miro-engineering/reliable-load-testing-with-regards-to-unexpected-nuances-6f38c82196a5)
@@ -509,13 +562,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Monzo</summary> <summary>Monzo</summary>
#### Blog Posts ### Blog Posts
* [Autoscaling Monzo: How we optimise our platform to be just the right size](https://monzo.com/blog/2020/10/19/autoscaling-monzo) * [Autoscaling Monzo: How we optimise our platform to be just the right size](https://monzo.com/blog/2020/10/19/autoscaling-monzo)
* [How weve evolved on-call at Monzo](https://monzo.com/blog/how-weve-evolved-on-call-at-monzo) * [How weve evolved on-call at Monzo](https://monzo.com/blog/how-weve-evolved-on-call-at-monzo)
* [How we respond to incidents](https://monzo.com/blog/2019/07/08/how-we-respond-to-incidents) * [How we respond to incidents](https://monzo.com/blog/2019/07/08/how-we-respond-to-incidents)
* [How we monitor Monzo](https://monzo.com/blog/2018/07/27/how-we-monitor-monzo) * [How we monitor Monzo](https://monzo.com/blog/2018/07/27/how-we-monitor-monzo)
#### Videos ### Videos
* [Eventually Consistent Service Discovery](https://www.usenix.org/conference/srecon19emea/presentation/patel) * [Eventually Consistent Service Discovery](https://www.usenix.org/conference/srecon19emea/presentation/patel)
</details> </details>
@@ -523,7 +578,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Netflix</summary> <summary>Netflix</summary>
#### Blog Posts ### Blog Posts
* [Building Netflixs Distributed Tracing Infrastructure](https://netflixtechblog.com/building-netflixs-distributed-tracing-infrastructure-bb856c319304) * [Building Netflixs Distributed Tracing Infrastructure](https://netflixtechblog.com/building-netflixs-distributed-tracing-infrastructure-bb856c319304)
* [Lessons from Building Observability Tools at Netflix](https://netflixtechblog.com/lessons-from-building-observability-tools-at-netflix-7cfafed6ab17) * [Lessons from Building Observability Tools at Netflix](https://netflixtechblog.com/lessons-from-building-observability-tools-at-netflix-7cfafed6ab17)
* [Edgar: Solving Mysteries Faster with Observability](https://netflixtechblog.com/edgar-solving-mysteries-faster-with-observability-e1a76302c71f) * [Edgar: Solving Mysteries Faster with Observability](https://netflixtechblog.com/edgar-solving-mysteries-faster-with-observability-e1a76302c71f)
@@ -542,10 +598,12 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Announcing Security Monkey — AWS Security Configuration Monitoring and Analysis](https://netflixtechblog.com/announcing-security-monkey-aws-security-configuration-monitoring-and-analysis-1f2bfb001708) * [Announcing Security Monkey — AWS Security Configuration Monitoring and Analysis](https://netflixtechblog.com/announcing-security-monkey-aws-security-configuration-monitoring-and-analysis-1f2bfb001708)
* [Lessons Netflix Learned from the AWS Outage](https://netflixtechblog.com/lessons-netflix-learned-from-the-aws-outage-deefe5fd0c04) * [Lessons Netflix Learned from the AWS Outage](https://netflixtechblog.com/lessons-netflix-learned-from-the-aws-outage-deefe5fd0c04)
#### Major incidents & analysis reports ### Major incidents & analysis reports
* [Post-mortem of October 22, 2012 AWS degradation](https://netflixtechblog.com/post-mortem-of-october-22-2012-aws-degradation-efcee3ab40d5) * [Post-mortem of October 22, 2012 AWS degradation](https://netflixtechblog.com/post-mortem-of-october-22-2012-aws-degradation-efcee3ab40d5)
#### Videos ### Videos
* [AWS re:Invent 2019: A day in the life of a Netflix engineer (NFX202)](https://www.youtube.com/watch?v=0QS1TWLooo0) * [AWS re:Invent 2019: A day in the life of a Netflix engineer (NFX202)](https://www.youtube.com/watch?v=0QS1TWLooo0)
* [When /bin/sh Attacks: Revisiting "Automate All the Things"](https://www.usenix.org/conference/srecon20americas/presentation/reed) * [When /bin/sh Attacks: Revisiting "Automate All the Things"](https://www.usenix.org/conference/srecon20americas/presentation/reed)
* [How Did Things Go Right? Learning More from Incidents](https://www.usenix.org/conference/srecon19americas/presentation/kitchens) * [How Did Things Go Right? Learning More from Incidents](https://www.usenix.org/conference/srecon19americas/presentation/kitchens)
@@ -571,7 +629,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>PayPal</summary> <summary>PayPal</summary>
#### Videos ### Videos
* [SREcon Conversations Asia/Pacific with Karthikeyan Selvaraj and Rajesh Ramachandran, PayPal](https://www.youtube.com/watch?v=XAIj567wBsU&feature=emb_title) * [SREcon Conversations Asia/Pacific with Karthikeyan Selvaraj and Rajesh Ramachandran, PayPal](https://www.youtube.com/watch?v=XAIj567wBsU&feature=emb_title)
* [SRE Then vs SRE Now: A Balancing Act between Reflexes and Intuitive Instincts at PayPal](https://www.usenix.org/conference/srecon19asia/presentation/sunder-vr) * [SRE Then vs SRE Now: A Balancing Act between Reflexes and Intuitive Instincts at PayPal](https://www.usenix.org/conference/srecon19asia/presentation/sunder-vr)
* [Detecting Service Degradation and Failures at Scale through Distributed Log Processing](https://www.usenix.org/conference/srecon19asia/presentation/narayanan) * [Detecting Service Degradation and Failures at Scale through Distributed Log Processing](https://www.usenix.org/conference/srecon19asia/presentation/narayanan)
@@ -583,13 +642,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Pinterest</summary> <summary>Pinterest</summary>
#### Blog Posts ### Blog Posts
* [Simplifying web deploys](https://medium.com/pinterest-engineering/simplifying-web-deploys-19244fe13737) * [Simplifying web deploys](https://medium.com/pinterest-engineering/simplifying-web-deploys-19244fe13737)
* [Upgrading Pinterest operational metrics](https://medium.com/pinterest-engineering/upgrading-pinterest-operational-metrics-8718d058079a) * [Upgrading Pinterest operational metrics](https://medium.com/pinterest-engineering/upgrading-pinterest-operational-metrics-8718d058079a)
* [Distributed tracing at Pinterest with new open source tools](https://medium.com/pinterest-engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b) * [Distributed tracing at Pinterest with new open source tools](https://medium.com/pinterest-engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
* [Auto scaling Pinterest](https://medium.com/pinterest-engineering/auto-scaling-pinterest-df1d2beb4d64) * [Auto scaling Pinterest](https://medium.com/pinterest-engineering/auto-scaling-pinterest-df1d2beb4d64)
#### Videos ### Videos
* [Building Actionable Code Ownership](https://www.usenix.org/conference/srecon20americas/presentation/mukherji) * [Building Actionable Code Ownership](https://www.usenix.org/conference/srecon20americas/presentation/mukherji)
* [Evolution of Observability Tools at Pinterest](https://www.usenix.org/conference/srecon19emea/presentation/abbas) * [Evolution of Observability Tools at Pinterest](https://www.usenix.org/conference/srecon19emea/presentation/abbas)
* [Automating OS/Platform Upgrades for Service Owners](https://www.usenix.org/conference/srecon19asia/presentation/menezes) * [Automating OS/Platform Upgrades for Service Owners](https://www.usenix.org/conference/srecon19asia/presentation/menezes)
@@ -599,7 +660,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Postman</summary> <summary>Postman</summary>
#### Blog Posts ### Blog Posts
* [Learn how your Kubernetes clusters respond to failure using Gremlin and Grafana](https://medium.com/better-practices/chaos-d3ef238ec328) * [Learn how your Kubernetes clusters respond to failure using Gremlin and Grafana](https://medium.com/better-practices/chaos-d3ef238ec328)
</details> </details>
@@ -607,7 +669,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Slalom Build</summary> <summary>Slalom Build</summary>
#### Blog Posts ### Blog Posts
* [Beginners Guide to DevOps: How to Make It into the Industry](https://medium.com/slalom-build/beginners-guid-to-devops-how-to-make-it-into-the-industry-c1652d59807) * [Beginners Guide to DevOps: How to Make It into the Industry](https://medium.com/slalom-build/beginners-guid-to-devops-how-to-make-it-into-the-industry-c1652d59807)
* [GitHub Actions: Beyond CI/CD](https://medium.com/slalom-build/github-actions-beyond-ci-cd-cb3ddc6abaa) * [GitHub Actions: Beyond CI/CD](https://medium.com/slalom-build/github-actions-beyond-ci-cd-cb3ddc6abaa)
@@ -626,7 +688,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Scribd</summary> <summary>Scribd</summary>
#### Blog Posts ### Blog Posts
* [Learning from incidents: getting Sidekiq ready to serve a billion jobs](https://tech.scribd.com/blog/2020/sidekiq-incident-learnings.html) * [Learning from incidents: getting Sidekiq ready to serve a billion jobs](https://tech.scribd.com/blog/2020/sidekiq-incident-learnings.html)
* [A testimonial for using PagerDuty at Scribd](https://tech.scribd.com/blog/2020/pagerduty-at-scribd.html) * [A testimonial for using PagerDuty at Scribd](https://tech.scribd.com/blog/2020/pagerduty-at-scribd.html)
* [Assigning pager duty to developers](https://tech.scribd.com/blog/2019/managing-pagerduty-rotations.html) * [Assigning pager duty to developers](https://tech.scribd.com/blog/2019/managing-pagerduty-rotations.html)
@@ -636,7 +699,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Shopify</summary> <summary>Shopify</summary>
#### Blog Posts ### Blog Posts
* [Resiliency Planning for High-Traffic Events](https://shopify.engineering/resiliency-planning-for-high-traffic-events) * [Resiliency Planning for High-Traffic Events](https://shopify.engineering/resiliency-planning-for-high-traffic-events)
* [Capacity Planning at Scale](https://shopify.engineering/capacity-planning-shopify) * [Capacity Planning at Scale](https://shopify.engineering/capacity-planning-shopify)
* [Using DNS Traffic Management to Add Resiliency to Shopifys Services](https://shopify.engineering/using-dns-traffic-management-add-resiliency-shopify-services) * [Using DNS Traffic Management to Add Resiliency to Shopifys Services](https://shopify.engineering/using-dns-traffic-management-add-resiliency-shopify-services)
@@ -644,7 +708,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [Implementing ChatOps into our Incident Management Procedure](https://shopify.engineering/implementing-chatops-into-our-incident-management-procedure) * [Implementing ChatOps into our Incident Management Procedure](https://shopify.engineering/implementing-chatops-into-our-incident-management-procedure)
* [StatsD at Shopify](https://shopify.engineering/17488320-statsd-at-shopify) * [StatsD at Shopify](https://shopify.engineering/17488320-statsd-at-shopify)
#### Videos ### Videos
* [Network Monitor: A Tale of ACKnowledging an Observability Gap](https://www.usenix.org/conference/srecon19emea/presentation/gedge) * [Network Monitor: A Tale of ACKnowledging an Observability Gap](https://www.usenix.org/conference/srecon19emea/presentation/gedge)
* [Expect the Unexpected: Preparing SRE Teams for Responding to Novel Failures](https://www.usenix.org/conference/srecon19emea/presentation/arthorne) * [Expect the Unexpected: Preparing SRE Teams for Responding to Novel Failures](https://www.usenix.org/conference/srecon19emea/presentation/arthorne)
* [Advanced Napkin Math: Estimating System Performance from First Principles](https://www.usenix.org/conference/srecon19emea/presentation/eskildsen) * [Advanced Napkin Math: Estimating System Performance from First Principles](https://www.usenix.org/conference/srecon19emea/presentation/eskildsen)
@@ -654,12 +719,15 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Slack</summary> <summary>Slack</summary>
#### Blog Posts ### Blog Posts
* [Slacks Outage on January 4th 2021](https://slack.engineering/slacks-outage-on-january-4th-2021/) * [Slacks Outage on January 4th 2021](https://slack.engineering/slacks-outage-on-january-4th-2021/)
* [A Terrible, Horrible, No-Good, Very Bad Day at Slack](https://slack.engineering/a-terrible-horrible-no-good-very-bad-day-at-slack/) * [A Terrible, Horrible, No-Good, Very Bad Day at Slack](https://slack.engineering/a-terrible-horrible-no-good-very-bad-day-at-slack/)
* [Deploys at Slack](https://slack.engineering/deploys-at-slack/) * [Deploys at Slack](https://slack.engineering/deploys-at-slack/)
* [Disasterpiece Theater: Slacks process for approachable Chaos Engineering](https://slack.engineering/disasterpiece-theater-slacks-process-for-approachable-chaos-engineering/) * [Disasterpiece Theater: Slacks process for approachable Chaos Engineering](https://slack.engineering/disasterpiece-theater-slacks-process-for-approachable-chaos-engineering/)
#### Videos
### Videos
* [Slack at the Edge](https://www.usenix.org/conference/srecon19asia/presentation/pemberton) * [Slack at the Edge](https://www.usenix.org/conference/srecon19asia/presentation/pemberton)
* [What Breaks Our Systems: A Taxonomy of Black Swans](https://www.usenix.org/conference/srecon19americas/presentation/nolan-taxonomy) * [What Breaks Our Systems: A Taxonomy of Black Swans](https://www.usenix.org/conference/srecon19americas/presentation/nolan-taxonomy)
@@ -668,22 +736,25 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Soundcloud</summary> <summary>Soundcloud</summary>
## Soundcloud ### Blog Posts
#### Blog Posts
* [Alerting on SLOs like Pros](https://developers.soundcloud.com/blog/alerting-on-slos) * [Alerting on SLOs like Pros](https://developers.soundcloud.com/blog/alerting-on-slos)
* [Hands-Off Deployment with Canary](https://developers.soundcloud.com/blog/hands-off-deployment-with-canary) * [Hands-Off Deployment with Canary](https://developers.soundcloud.com/blog/hands-off-deployment-with-canary)
* [Prometheus has come of age a reflection on the development of an open-source project](https://developers.soundcloud.com/blog/prometheus-has-come-of-age-a-reflection-on-the-development-of-an-open-source-project) * [Prometheus has come of age a reflection on the development of an open-source project](https://developers.soundcloud.com/blog/prometheus-has-come-of-age-a-reflection-on-the-development-of-an-open-source-project)
* [Prometheus: Monitoring at SoundCloud](https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud) * [Prometheus: Monitoring at SoundCloud](https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud)
</details> </details>
<details> <details>
<summary>Spotify</summary> <summary>Spotify</summary>
#### Blog Posts ### Blog Posts
* [Techbytes: What The Industry Misses About Incidents and What You Can Do](https://engineering.atspotify.com/2020/02/26/techbytes-what-the-industry-misses-about-incidents-and-what-you-can-do/) * [Techbytes: What The Industry Misses About Incidents and What You Can Do](https://engineering.atspotify.com/2020/02/26/techbytes-what-the-industry-misses-about-incidents-and-what-you-can-do/)
* [Automated Incident Response Infrastructure in GCP](https://engineering.atspotify.com/2019/04/04/whacking-a-million-moles-automated-incident-response-infrastructure-in-gcp/) * [Automated Incident Response Infrastructure in GCP](https://engineering.atspotify.com/2019/04/04/whacking-a-million-moles-automated-incident-response-infrastructure-in-gcp/)
#### Videos ### Videos
* [Tracing, Fast and Slow: Digging into and Improving Your Web Service's Performance](https://www.usenix.org/conference/srecon19americas/presentation/root) * [Tracing, Fast and Slow: Digging into and Improving Your Web Service's Performance](https://www.usenix.org/conference/srecon19americas/presentation/root)
</details> </details>
@@ -691,10 +762,12 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Squarespace</summary> <summary>Squarespace</summary>
#### Blog Posts ### Blog Posts
* [Under the Hood: Ensuring Site Reliability](https://engineering.squarespace.com/blog/2017/under-the-hood-ensuring-site-reliability) * [Under the Hood: Ensuring Site Reliability](https://engineering.squarespace.com/blog/2017/under-the-hood-ensuring-site-reliability)
#### Videos ### Videos
* [Pushing through Friction](https://www.usenix.org/conference/srecon19emea/presentation/na) * [Pushing through Friction](https://www.usenix.org/conference/srecon19emea/presentation/na)
* [How to SRE When Everything's Already on Fire](https://www.usenix.org/conference/srecon19emea/presentation/hidalgo) * [How to SRE When Everything's Already on Fire](https://www.usenix.org/conference/srecon19emea/presentation/hidalgo)
* [Case Study: Implementing SLOs for a New Service](https://www.usenix.org/conference/srecon19americas/presentation/lawson) * [Case Study: Implementing SLOs for a New Service](https://www.usenix.org/conference/srecon19americas/presentation/lawson)
@@ -705,11 +778,13 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Stack Overflow</summary> <summary>Stack Overflow</summary>
#### Blog Posts ### Blog Posts
* [A deeper dive into our May 2019 security incident](https://stackoverflow.blog/2021/01/25/a-deeper-dive-into-our-may-2019-security-incident/) * [A deeper dive into our May 2019 security incident](https://stackoverflow.blog/2021/01/25/a-deeper-dive-into-our-may-2019-security-incident/)
* [Guest Post - Failing over without falling over](https://stackoverflow.blog/2020/10/23/adrian-cockcroft-aws-failover-chaos-engineering-fault-tolerance-distaster-recovery/) * [Guest Post - Failing over without falling over](https://stackoverflow.blog/2020/10/23/adrian-cockcroft-aws-failover-chaos-engineering-fault-tolerance-distaster-recovery/)
#### Videos ### Videos
* [Low Context DevOps: Improving SRE Team Culture through Defaults, Documentation, and Discipline](https://www.usenix.org/conference/srecon20americas/presentation/limoncelli) * [Low Context DevOps: Improving SRE Team Culture through Defaults, Documentation, and Discipline](https://www.usenix.org/conference/srecon20americas/presentation/limoncelli)
</details> </details>
@@ -717,11 +792,13 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Stripe</summary> <summary>Stripe</summary>
#### Blog Posts ### Blog Posts
* [Fast and flexible observability with canonical log lines](https://stripe.com/blog/canonical-log-lines) * [Fast and flexible observability with canonical log lines](https://stripe.com/blog/canonical-log-lines)
* [Introducing Veneur: high performance and global aggregation for Datadog](https://stripe.com/blog/engineering/page/3) * [Introducing Veneur: high performance and global aggregation for Datadog](https://stripe.com/blog/engineering/page/3)
#### Videos ### Videos
* [How Stripe Invests in Technical Infrastructure](https://www.usenix.org/conference/srecon19emea/presentation/larson) * [How Stripe Invests in Technical Infrastructure](https://www.usenix.org/conference/srecon19emea/presentation/larson)
* [The AWS Billing Machine and Optimizing Cloud Costs](https://www.usenix.org/conference/srecon19asia/presentation/lopopolo) * [The AWS Billing Machine and Optimizing Cloud Costs](https://www.usenix.org/conference/srecon19asia/presentation/lopopolo)
@@ -730,7 +807,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Target</summary> <summary>Target</summary>
#### Blog Posts ### Blog Posts
* [Ɔhaos Ǝnginǝǝring @ Target - Part 2](https://tech.target.com/2019/05/09/chaos-engineering-at-Target.html) * [Ɔhaos Ǝnginǝǝring @ Target - Part 2](https://tech.target.com/2019/05/09/chaos-engineering-at-Target.html)
* [Ɔhaos Ǝnginǝǝring @ Target - Part 1](https://tech.target.com/2019/02/05/chaos-engineering-at-Target.html) * [Ɔhaos Ǝnginǝǝring @ Target - Part 1](https://tech.target.com/2019/02/05/chaos-engineering-at-Target.html)
* [GoAlert - Your Future Open Source, On-Call Notification Product](https://tech.target.com/2019/02/25/introducing-goalert.html) * [GoAlert - Your Future Open Source, On-Call Notification Product](https://tech.target.com/2019/02/25/introducing-goalert.html)
@@ -743,7 +821,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Trivago</summary> <summary>Trivago</summary>
#### Blog Posts ### Blog Posts
* [How To Get Fooled By Metrics](https://tech.trivago.com/2020/12/04/how-to-get-fooled-by-metrics/) * [How To Get Fooled By Metrics](https://tech.trivago.com/2020/12/04/how-to-get-fooled-by-metrics/)
</details> </details>
@@ -751,34 +830,38 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Uber</summary> <summary>Uber</summary>
#### Blog Posts ### Blog Posts
* [Disaster Recovery for Multi-Region Kafka at Uber](https://eng.uber.com/kafka/) * [Disaster Recovery for Multi-Region Kafka at Uber](https://eng.uber.com/kafka/)
* [Engineering Failover Handling in Ubers Mobile Networking Infrastructure](https://eng.uber.com/eng-failover-handling/) * [Engineering Failover Handling in Ubers Mobile Networking Infrastructure](https://eng.uber.com/eng-failover-handling/)
* [Optimizing Observability with Jaeger, M3, and XYS at Uber](https://eng.uber.com/optimizing-observability/) * [Optimizing Observability with Jaeger, M3, and XYS at Uber](https://eng.uber.com/optimizing-observability/)
### Videos
#### Videos
* [A Tale of Two Rotations: Building a Humane & Effective On-Call](https://www.usenix.org/conference/srecon19emea/presentation/lee) * [A Tale of Two Rotations: Building a Humane & Effective On-Call](https://www.usenix.org/conference/srecon19emea/presentation/lee)
* [Testing in Production at Scale](https://www.usenix.org/conference/srecon19americas/presentation/gud) * [Testing in Production at Scale](https://www.usenix.org/conference/srecon19americas/presentation/gud)
* [A History of SRE at Uber with Rick Boone of Uber](https://www.youtube.com/watch?v=qJnS-EfIIIE) * [A History of SRE at Uber with Rick Boone of Uber](https://www.youtube.com/watch?v=qJnS-EfIIIE)
</details> </details>
<details> <details>
<summary>VGW</summary> <summary>VGW</summary>
#### Blog Posts ### Blog Posts
* [The SRE Incident Response game](https://medium.com/@bruce_25864/the-sre-incident-response-game-db242fff391c) * [The SRE Incident Response game](https://medium.com/@bruce_25864/the-sre-incident-response-game-db242fff391c)
#### Videos ### Videos
* [Level Up Your Incident Response With Gameplay](https://youtu.be/c2-52EP8_7c) * [Level Up Your Incident Response With Gameplay](https://youtu.be/c2-52EP8_7c)
</details> </details>
<details> <details>
<summary>Wikimedia Foundation</summary> <summary>Wikimedia Foundation</summary>
#### Videos ### Videos
* [Testing Encyclopedias in Production](https://www.usenix.org/conference/srecon20americas/presentation/mouzeli) * [Testing Encyclopedias in Production](https://www.usenix.org/conference/srecon20americas/presentation/mouzeli)
* [What Happens When You Type en.wikipedia.org?](https://www.usenix.org/conference/srecon19emea/presentation/mouzeli) * [What Happens When You Type en.wikipedia.org?](https://www.usenix.org/conference/srecon19emea/presentation/mouzeli)
@@ -787,7 +870,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>Zerodha</summary> <summary>Zerodha</summary>
#### Blog Posts ### Blog Posts
* [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/) * [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/)
</details> </details>
@@ -795,7 +879,8 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
<details> <details>
<summary>SRECon Mix Playlist</summary> <summary>SRECon Mix Playlist</summary>
#### Videos ### Videos
* [Adobe - The Good, the Bad and the Ugly: The 3 Learnings of an SRE](https://www.usenix.org/conference/srecon20americas/presentation/charagondla) * [Adobe - The Good, the Bad and the Ugly: The 3 Learnings of an SRE](https://www.usenix.org/conference/srecon20americas/presentation/charagondla)
* [Amdocs - SREs at Telecom and Media Industry: Bridging between Legacy and Cloud Native Apps](https://www.usenix.org/conference/srecon20americas/presentation/yitzhaki) * [Amdocs - SREs at Telecom and Media Industry: Bridging between Legacy and Cloud Native Apps](https://www.usenix.org/conference/srecon20americas/presentation/yitzhaki)
* [Amazon - Confessions of a Systems Engineer: Learning from My 20+ Years of Failure](https://www.usenix.org/conference/srecon20americas/presentation/argent) * [Amazon - Confessions of a Systems Engineer: Learning from My 20+ Years of Failure](https://www.usenix.org/conference/srecon20americas/presentation/argent)
@@ -824,6 +909,7 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
* [WeWork - Learning from Learnings: Anatomy of Three Incidents](https://www.usenix.org/conference/srecon19americas/presentation/shoup) * [WeWork - Learning from Learnings: Anatomy of Three Incidents](https://www.usenix.org/conference/srecon19americas/presentation/shoup)
* [Yelp - What I Wish I Knew before Going On-Call](https://www.usenix.org/conference/srecon19emea/presentation/shu) * [Yelp - What I Wish I Knew before Going On-Call](https://www.usenix.org/conference/srecon19emea/presentation/shu)
* [Zendesk - Latency and Availability Error Budgets Done Right at Scale](https://www.usenix.org/conference/srecon20americas/presentation/moyer) * [Zendesk - Latency and Availability Error Budgets Done Right at Scale](https://www.usenix.org/conference/srecon20americas/presentation/moyer)
</details> </details>
--- ---
@@ -875,10 +961,9 @@ _Note to readers: This list refers to some of the articles, posts, videos, tools
## Other How They... repos ## Other How They... repos
* [HowTheyTest](https://github.com/abhivaikar/howtheytest) * [Howtheytest](https://github.com/abhivaikar/howtheytest)
* [HowTheyDevOps](https://github.com/bregman-arie/howtheydevops) * [Howtheydevops](https://github.com/bregman-arie/howtheydevops)
* [HowTheyAWS](https://github.com/upgundecha/howtheyaws) * [Howtheyaws](https://github.com/upgundecha/howtheyaws)
## Contribute ## Contribute
@@ -893,4 +978,4 @@ related or neighboring rights to this work.
--- ---
If you decide to use this anywhere please give a credit to [@upgundecha](https://www.twitter.com/upgundecha) on twitter, also If you like my work, check out other projects on my Github. If you decide to use this anywhere please give a credit to [@upgundecha](https://www.twitter.com/upgundecha) on twitter, also If you like my work, check out other projects on my Github.

View File

@@ -15,7 +15,6 @@ Ensure your pull request adheres to the following guidelines:
Thank you for your suggestions! Thank you for your suggestions!
## Updating your PR ## Updating your PR
A lot of times, making a PR adhere to the standards above can be difficult. A lot of times, making a PR adhere to the standards above can be difficult.