Files
school-of-sre/courses/level101/metrics_and_monitoring/conclusion.md
Jana R 4239ecf473 docs (level 101): fix typos, punctuation, formatting (#160)
* docs: formatted for readability

* docs: rephrased and added punctuation

* docs: fix typos, punctuation, formatting

* docs: fix typo and format

* docs: fix caps and formatting

* docs: fix punctuation and formatting

* docs: capitalized SQL commands, fixed puntuation, formatting

* docs: fix punctuation

* docs: fix punctuation and formatting

* docs: fix caps,punctuation and formatting

* docs: fix links, punctuation, formatting

* docs: fix code block formatting

* docs: fix punctuation, indentation and formatting
2024-07-28 17:38:19 +05:30

52 lines
2.2 KiB
Markdown

# Conclusion
A robust monitoring and alerting system is necessary for maintaining and
troubleshooting a system. A dashboard with key metrics can give you an
overview of service performance, all in one place. Well-defined alerts
(with realistic thresholds and notifications) further enable you to
quickly identify any anomalies in the service infrastructure and in
resource saturation. By taking necessary actions, you can avoid any
service degradations and decrease MTTD for service breakdowns.
In addition to in-house monitoring, monitoring real-user experience can
help you to understand service performance as perceived by the users.
Many modules are involved in serving the user, and most of them are out
of your control. Therefore, you need to have real-user monitoring in
place.
Metrics give very abstract details on service performance. To get a
better understanding of the system and for faster recovery during
incidents, you might want to implement the other two pillars of
observability: logs and tracing. Logs and trace data can help you
understand what led to service failure or degradation.
Following are some resources to learn more about monitoring and
observability:
- [Google SRE book: Monitoring Distributed
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
- [Mastering Distributed Tracing by Yuri
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
## References
- [Google SRE book: Monitoring Distributed
Systems](https://sre.google/sre-book/monitoring-distributed-systems/)
- [Mastering Distributed Tracing, by Yuri
Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/)
- [Monitoring and
Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c)
- [Three PIllars with Zero
Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8)
- Engineering blogs on
[LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring),
[Grafana](https://grafana.com/blog/),
[Elastic.co](https://www.elastic.co/blog/),
[OpenTelemetry](https://medium.com/opentelemetry)