school-of-sre/courses/level101/metrics_and_monitoring/conclusion.md

# Conclusion

A robust monitoring and alerting system is necessary for maintaining and
troubleshooting a system. A dashboard with key metrics can give you an
overview of service performance, all in one place. Well-defined alerts
(with realistic thresholds and notifications) further enable you to
quickly identify any anomalies in the service infrastructure and in
resource saturation. By taking necessary actions, you can avoid any
service degradations and decrease MTTD for service breakdowns.

In addition to in-house monitoring, monitoring real-user experience can
help you to understand service performance as perceived by the users.
Many modules are involved in serving the user, and most of them are out
of your control. Therefore, you need to have real-user monitoring in
place.

Metrics give very abstract details on service performance. To get a
better understanding of the system and for faster recovery during
incidents, you might want to implement the other two pillars of
observability: logs and tracing. Logs and trace data can help you
understand what led to service failure or degradation.

Following are some resources to learn more about monitoring and
observability:

-   [Google SRE book: Monitoring Distributed
     Systems](https://sre.google/sre-book/monitoring-distributed-systems/)

-   [Mastering Distributed Tracing by Yuri
     Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)


## References

-   [Google SRE book: Monitoring Distributed
     Systems](https://sre.google/sre-book/monitoring-distributed-systems/)

-   [Mastering Distributed Tracing, by Yuri
     Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)

-   [Monitoring and
     Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c)

-   [Three PIllars with Zero
     Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8)

-   Engineering blogs on
         [LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring),
         [Grafana](https://grafana.com/blog/),
         [Elastic.co](https://www.elastic.co/blog/),
         [OpenTelemetry](https://medium.com/opentelemetry)