mirror of
https://github.com/linkedin/school-of-sre
synced 2026-01-21 07:58:03 +00:00
Deployed 4239ecf with MkDocs version: 1.2.3
This commit is contained in:
@@ -2109,11 +2109,11 @@
|
||||
<p>Earlier we discussed different ways to collect key metric data points
|
||||
from a service and its underlying infrastructure. This data gives us a
|
||||
better understanding of how the service is performing. One of the main
|
||||
objectives of monitoring is to detect any service degradations early
|
||||
objectives of monitoring is to detect any service degradations early
|
||||
(reduce Mean Time To Detect) and notify stakeholders so that the issues
|
||||
are either avoided or can be fixed early, thus reducing Mean Time To
|
||||
Recover (MTTR). For example, if you are notified when resource usage by
|
||||
a service exceeds 90 percent, you can take preventive measures to avoid
|
||||
a service exceeds 90%, you can take preventive measures to avoid
|
||||
any service breakdown due to a shortage of resources. On the other hand,
|
||||
when a service goes down due to an issue, early detection and
|
||||
notification of such incidents can help you quickly fix the issue.</p>
|
||||
@@ -2124,12 +2124,12 @@ notification of such incidents can help you quickly fix the issue.</p>
|
||||
set up alerts on one or a combination of metrics to actively monitor the
|
||||
service health. These alerts have a set of defined rules or conditions,
|
||||
and when the rule is broken, you are notified. These rules can be as
|
||||
simple as notifying when the metric value exceeds n to as complex as a
|
||||
week over week (WoW) comparison of standard deviation over a period of
|
||||
simple as notifying when the metric value exceeds <em>n</em> to as complex as a
|
||||
week-over-week (WoW) comparison of standard deviation over a period of
|
||||
time. Monitoring tools notify you about an active alert, and most of
|
||||
these tools support instant messaging (IM) platforms, SMS, email, or
|
||||
phone calls. Figure 8 shows a sample alert notification received on
|
||||
Slack for memory usage exceeding 90 percent of total RAM space on the
|
||||
Slack for memory usage exceeding 90% of total RAM space on the
|
||||
host.</p>
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user