Initial commit to metrics and monitoring course

This commit is contained in:
Sumit Sulakhe
2021-02-08 05:25:23 -08:00
committed by Sumit Sulakhe
parent bbd0cd38b5
commit a3ffe9c1d0
21 changed files with 693 additions and 0 deletions

View File

@@ -0,0 +1,50 @@
# Conclusion
A robust monitoring and alerting system is necessary for maintaining and
troubleshooting a system. A dashboard with key metrics can give you an
overview of service performance, all in one place. Well-defined alerts
(with realistic thresholds and notifications) further enable you to
quickly identify any anomalies in the service infrastructure and in
resource saturation. By taking necessary actions, you can avoid any
service degradations and decrease MTTD for service breakdowns.
In addition to in-house monitoring, monitoring real user experience can
help you to understand service performance as perceived by the users.
Many modules are involved in serving the user, and most of them are out
of your control. Therefore, you need to have real-user monitoring in
place.
Metrics give very abstract details on service performance. To get a
better understanding of the system and for faster recovery during
incidents, you might want to implement the other two pillars of
observability: logs and tracing. Logs and trace data can help you
understand what led to service failure or degradation.
Following are some resources to learn more about monitoring and
observability:
- [Google SRE book: Monitoring Distributed
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
- [Mastering Distributed Tracing by Yuri
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
- Engineering blogs on
[LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring),
[Grafana](https://grafana.com/blog/),
[Elastic.co](https://www.elastic.co/blog/),
[OpenTelemetry](https://medium.com/opentelemetry)
## References
- [Google SRE book: Monitoring Distributed
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
- [Mastering Distributed Tracing, by Yuri
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
- [Monitoring and
Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c)
- [Three PIllars with Zero
Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8)