mirror of
https://github.com/linkedin/school-of-sre
synced 2026-01-07 09:08:02 +00:00
reorganise course folders
This commit is contained in:
52
courses/level101/metrics_and_monitoring/conclusion.md
Normal file
52
courses/level101/metrics_and_monitoring/conclusion.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Conclusion
|
||||
|
||||
A robust monitoring and alerting system is necessary for maintaining and
|
||||
troubleshooting a system. A dashboard with key metrics can give you an
|
||||
overview of service performance, all in one place. Well-defined alerts
|
||||
(with realistic thresholds and notifications) further enable you to
|
||||
quickly identify any anomalies in the service infrastructure and in
|
||||
resource saturation. By taking necessary actions, you can avoid any
|
||||
service degradations and decrease MTTD for service breakdowns.
|
||||
|
||||
In addition to in-house monitoring, monitoring real user experience can
|
||||
help you to understand service performance as perceived by the users.
|
||||
Many modules are involved in serving the user, and most of them are out
|
||||
of your control. Therefore, you need to have real-user monitoring in
|
||||
place.
|
||||
|
||||
Metrics give very abstract details on service performance. To get a
|
||||
better understanding of the system and for faster recovery during
|
||||
incidents, you might want to implement the other two pillars of
|
||||
observability: logs and tracing. Logs and trace data can help you
|
||||
understand what led to service failure or degradation.
|
||||
|
||||
Following are some resources to learn more about monitoring and
|
||||
observability:
|
||||
|
||||
- [Google SRE book: Monitoring Distributed
|
||||
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
|
||||
|
||||
- [Mastering Distributed Tracing by Yuri
|
||||
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
|
||||
|
||||
|
||||
|
||||
## References
|
||||
|
||||
- [Google SRE book: Monitoring Distributed
|
||||
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
|
||||
|
||||
- [Mastering Distributed Tracing, by Yuri
|
||||
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
|
||||
|
||||
- [Monitoring and
|
||||
Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c)
|
||||
|
||||
- [Three PIllars with Zero
|
||||
Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8)
|
||||
|
||||
- Engineering blogs on
|
||||
[LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring),
|
||||
[Grafana](https://grafana.com/blog/),
|
||||
[Elastic.co](https://www.elastic.co/blog/),
|
||||
[OpenTelemetry](https://medium.com/opentelemetry)
|
||||
Reference in New Issue
Block a user