mirror of
https://github.com/linkedin/school-of-sre
synced 2026-01-21 07:58:03 +00:00
Merged further readings and references. Added absolute path for links
This commit is contained in:
committed by
Sumit Sulakhe
parent
561f8cd547
commit
b9a9d3cc12
@@ -16,12 +16,12 @@ In this course, we are focusing on building strong foundational skills. The cour
|
|||||||
- [Linux Networking](https://sumit419.github.io/school-of-sre/linux_networking/intro/)
|
- [Linux Networking](https://sumit419.github.io/school-of-sre/linux_networking/intro/)
|
||||||
- [Python and Web](https://sumit419.github.io/school-of-sre/python_web/intro/)
|
- [Python and Web](https://sumit419.github.io/school-of-sre/python_web/intro/)
|
||||||
- Data
|
- Data
|
||||||
- [Relational databases(MySQL)](https://sumit419.github.io/school-of-sre/databases_sql/intro/)
|
- [Relational databases(MySQL)](https://linkedin.github.io/school-of-sre/databases_sql/intro/)
|
||||||
- [NoSQL concepts](https://sumit419.github.io/school-of-sre/databases_nosql/intro/)
|
- [NoSQL concepts](https://linkedin.github.io/school-of-sre/databases_nosql/intro/)
|
||||||
- [Big Data](https://sumit419.github.io/school-of-sre/big_data/intro/)
|
- [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/)
|
||||||
- [Systems Design](https://sumit419.github.io/school-of-sre/systems_design/intro/)
|
- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/)
|
||||||
- [Metrics and Monitoring](https://sumit419.github.io/school-of-sre/metrics_and_monitoring/introduction/)
|
- [Metrics and Monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/)
|
||||||
- [Security](https://sumit419.github.io/school-of-sre/security/intro/)
|
- [Security](https://linkedin.github.io/school-of-sre/security/intro/)
|
||||||
|
|
||||||
We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references which could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer.
|
We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references which could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer.
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,8 @@ any service breakdown due to a shortage of resources. On the other hand,
|
|||||||
when a service goes down due to an issue, early detection and
|
when a service goes down due to an issue, early detection and
|
||||||
notification of such incidents can help you quickly fix the issue.
|
notification of such incidents can help you quickly fix the issue.
|
||||||
|
|
||||||
 <p align="center"> Figure 8: An alert notification received on Slack </p>
|

|
||||||
|
<p align="center"> Figure 8: An alert notification received on Slack </p>
|
||||||
|
|
||||||
Today most of the monitoring services available provide a mechanism to
|
Today most of the monitoring services available provide a mechanism to
|
||||||
set up alerts on one or a combination of metrics to actively monitor the
|
set up alerts on one or a combination of metrics to actively monitor the
|
||||||
|
|||||||
@@ -27,7 +27,8 @@ on). Let's look at some of the tools that are predominantly used.
|
|||||||
- `-x` -- When displaying processes matched by other options,
|
- `-x` -- When displaying processes matched by other options,
|
||||||
includes processes that do not have a controlling terminal.
|
includes processes that do not have a controlling terminal.
|
||||||
|
|
||||||
 <p align="center"> Figure 2: Results of top command </p>
|

|
||||||
|
<p align="center"> Figure 2: Results of top command </p>
|
||||||
|
|
||||||
- `ss` -- The socket statistics command (ss) displays information
|
- `ss` -- The socket statistics command (ss) displays information
|
||||||
about network sockets on the system. This tool is the successor of
|
about network sockets on the system. This tool is the successor of
|
||||||
@@ -52,8 +53,8 @@ on). Let's look at some of the tools that are predominantly used.
|
|||||||
displays the statistics in a human-readable format.
|
displays the statistics in a human-readable format.
|
||||||
|
|
||||||
 <p align="center"> Figure 4: Memory
|
statistics on a host in human-readable form](images/image6.png)
|
||||||
statistics on a host in human-readable form </p>
|
<p align="center"> Figure 4: Memory statistics on a host in human-readable form </p>
|
||||||
|
|
||||||
- `df --` The df command displays disk space usage statistics. The
|
- `df --` The df command displays disk space usage statistics. The
|
||||||
`-i` command-line option is also often used to display
|
`-i` command-line option is also often used to display
|
||||||
@@ -61,7 +62,8 @@ on). Let's look at some of the tools that are predominantly used.
|
|||||||
statistics. The `-h` command-line option is used for displaying
|
statistics. The `-h` command-line option is used for displaying
|
||||||
statistics in a human-readable format.
|
statistics in a human-readable format.
|
||||||
|
|
||||||
 <p align="center"> Figure 5:
|

|
||||||
|
<p align="center"> Figure 5:
|
||||||
Disk usage statistics on a system in human-readable form </p>
|
Disk usage statistics on a system in human-readable form </p>
|
||||||
|
|
||||||
- `sar` -- The sar utility monitors various subsystems, such as CPU
|
- `sar` -- The sar utility monitors various subsystems, such as CPU
|
||||||
@@ -75,7 +77,8 @@ on). Let's look at some of the tools that are predominantly used.
|
|||||||
specifies which network interface to watch.
|
specifies which network interface to watch.
|
||||||
|
|
||||||
 <p align="center"> Figure 6: Network bandwidth usage by
|
active connection on the host](images/image2.png)
|
||||||
|
<p align="center"> Figure 6: Network bandwidth usage by
|
||||||
active connection on the host </p>
|
active connection on the host </p>
|
||||||
|
|
||||||
- `tcpdump` -- The tcpdump command is a network monitoring tool that
|
- `tcpdump` -- The tcpdump command is a network monitoring tool that
|
||||||
@@ -94,5 +97,6 @@ active connection on the host </p>
|
|||||||
- `port <port number>` -- Filters traffic to or from a particular
|
- `port <port number>` -- Filters traffic to or from a particular
|
||||||
port
|
port
|
||||||
|
|
||||||
 <p align="center"> Figure 7: *tcpdump* of packets on *docker0*
|

|
||||||
|
<p align="center"> Figure 7: *tcpdump* of packets on *docker0*
|
||||||
interface on a host </p>
|
interface on a host </p>
|
||||||
@@ -45,26 +45,26 @@ following topics:
|
|||||||
|
|
||||||
## Course content
|
## Course content
|
||||||
|
|
||||||
- [Introduction](#introduction)
|
- [Introduction](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#introduction)
|
||||||
|
|
||||||
- [Four golden signals of monitoring](#four-golden-signals-of-monitoring)
|
- [Four golden signals of monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#four-golden-signals-of-monitoring)
|
||||||
|
|
||||||
- [Why is monitoring important?](#why-is-monitoring-important)
|
- [Why is monitoring important?](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#why-is-monitoring-important)
|
||||||
|
|
||||||
- [Command-line tools](command-line_tools.md)
|
- [Command-line tools](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/command-line_tools/)
|
||||||
|
|
||||||
- [Third-party monitoring](third-party_monitoring.md)
|
- [Third-party monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/third-party_monitoring/)
|
||||||
|
|
||||||
- [Proactive monitoring using alerts](alerts.md)
|
- [Proactive monitoring using alerts](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/alerts/)
|
||||||
|
|
||||||
- [Best practices for monitoring](best_practices.md)
|
- [Best practices for monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/best_practices/)
|
||||||
|
|
||||||
- [Observability](observability.md)
|
- [Observability](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/observability/)
|
||||||
|
|
||||||
- [Logs](observability.md#logs)
|
- [Logs](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/observability/#logs)
|
||||||
- [Tracing](observability.md#tracing)
|
- [Tracing](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/bservability/#tracing)
|
||||||
|
|
||||||
[Conclusion](conclusion.md)
|
[Conclusion](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/conclusion/)
|
||||||
|
|
||||||
|
|
||||||
##
|
##
|
||||||
@@ -221,7 +221,8 @@ Before we discuss monitoring an application, let us look at the
|
|||||||
monitoring infrastructure. Following is an illustration of a basic
|
monitoring infrastructure. Following is an illustration of a basic
|
||||||
monitoring system.
|
monitoring system.
|
||||||
|
|
||||||
 <p align="center"> Figure 1: Illustration of a monitoring infrastructure </p>
|

|
||||||
|
<p align="center"> Figure 1: Illustration of a monitoring infrastructure </p>
|
||||||
|
|
||||||
Figure 1 shows a monitoring infrastructure mechanism for aggregating
|
Figure 1 shows a monitoring infrastructure mechanism for aggregating
|
||||||
metrics on the system, and collecting and storing the data for display.
|
metrics on the system, and collecting and storing the data for display.
|
||||||
|
|||||||
Reference in New Issue
Block a user