diff --git a/courses/index.md b/courses/index.md index 0ca6a9a..92b882b 100644 --- a/courses/index.md +++ b/courses/index.md @@ -16,12 +16,12 @@ In this course, we are focusing on building strong foundational skills. The cour - [Linux Networking](https://sumit419.github.io/school-of-sre/linux_networking/intro/) - [Python and Web](https://sumit419.github.io/school-of-sre/python_web/intro/) - Data - - [Relational databases(MySQL)](https://sumit419.github.io/school-of-sre/databases_sql/intro/) - - [NoSQL concepts](https://sumit419.github.io/school-of-sre/databases_nosql/intro/) - - [Big Data](https://sumit419.github.io/school-of-sre/big_data/intro/) -- [Systems Design](https://sumit419.github.io/school-of-sre/systems_design/intro/) -- [Metrics and Monitoring](https://sumit419.github.io/school-of-sre/metrics_and_monitoring/introduction/) -- [Security](https://sumit419.github.io/school-of-sre/security/intro/) + - [Relational databases(MySQL)](https://linkedin.github.io/school-of-sre/databases_sql/intro/) + - [NoSQL concepts](https://linkedin.github.io/school-of-sre/databases_nosql/intro/) + - [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/) +- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/) +- [Metrics and Monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/) +- [Security](https://linkedin.github.io/school-of-sre/security/intro/) We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references which could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer. diff --git a/courses/metrics_and_monitoring/alerts.md b/courses/metrics_and_monitoring/alerts.md index 85f9ce1..d235e62 100644 --- a/courses/metrics_and_monitoring/alerts.md +++ b/courses/metrics_and_monitoring/alerts.md @@ -13,7 +13,8 @@ any service breakdown due to a shortage of resources. On the other hand, when a service goes down due to an issue, early detection and notification of such incidents can help you quickly fix the issue. -![An alert notification received on Slack](images/image11.png)

Figure 8: An alert notification received on Slack

+![An alert notification received on Slack](images/image11.png) +

Figure 8: An alert notification received on Slack

Today most of the monitoring services available provide a mechanism to set up alerts on one or a combination of metrics to actively monitor the diff --git a/courses/metrics_and_monitoring/command-line_tools.md b/courses/metrics_and_monitoring/command-line_tools.md index 8371c40..01c4224 100644 --- a/courses/metrics_and_monitoring/command-line_tools.md +++ b/courses/metrics_and_monitoring/command-line_tools.md @@ -27,7 +27,8 @@ on). Let's look at some of the tools that are predominantly used. - `-x` -- When displaying processes matched by other options, includes processes that do not have a controlling terminal. - ![Results of top command](images/image12.png)

Figure 2: Results of top command

+ ![Results of top command](images/image12.png) +

Figure 2: Results of top command

- `ss` -- The socket statistics command (ss) displays information about network sockets on the system. This tool is the successor of @@ -52,8 +53,8 @@ on). Let's look at some of the tools that are predominantly used. displays the statistics in a human-readable format. ![Memory - statistics on a host in human-readable form](images/image6.png)

Figure 4: Memory - statistics on a host in human-readable form

+ statistics on a host in human-readable form](images/image6.png) +

Figure 4: Memory statistics on a host in human-readable form

- `df --` The df command displays disk space usage statistics. The `-i` command-line option is also often used to display @@ -61,7 +62,8 @@ on). Let's look at some of the tools that are predominantly used. statistics. The `-h` command-line option is used for displaying statistics in a human-readable format. -![Disk usage statistics on a system in human-readable form](images/image9.png)

Figure 5: +![Disk usage statistics on a system in human-readable form](images/image9.png) +

Figure 5: Disk usage statistics on a system in human-readable form

- `sar` -- The sar utility monitors various subsystems, such as CPU @@ -75,7 +77,8 @@ on). Let's look at some of the tools that are predominantly used. specifies which network interface to watch. ![Network bandwidth usage by - active connection on the host](images/image2.png)

Figure 6: Network bandwidth usage by + active connection on the host](images/image2.png) +

Figure 6: Network bandwidth usage by active connection on the host

- `tcpdump` -- The tcpdump command is a network monitoring tool that @@ -94,5 +97,6 @@ active connection on the host

- `port ` -- Filters traffic to or from a particular port -![tcpdump of packets on an interface](images/image10.png)

Figure 7: *tcpdump* of packets on *docker0* +![tcpdump of packets on an interface](images/image10.png) +

Figure 7: *tcpdump* of packets on *docker0* interface on a host

\ No newline at end of file diff --git a/courses/metrics_and_monitoring/introduction.md b/courses/metrics_and_monitoring/introduction.md index 9f01831..01d3491 100644 --- a/courses/metrics_and_monitoring/introduction.md +++ b/courses/metrics_and_monitoring/introduction.md @@ -45,26 +45,26 @@ following topics: ## Course content -- [Introduction](#introduction) +- [Introduction](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#introduction) - - [Four golden signals of monitoring](#four-golden-signals-of-monitoring) + - [Four golden signals of monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#four-golden-signals-of-monitoring) - - [Why is monitoring important?](#why-is-monitoring-important) + - [Why is monitoring important?](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/introduction/#why-is-monitoring-important) -- [Command-line tools](command-line_tools.md) +- [Command-line tools](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/command-line_tools/) -- [Third-party monitoring](third-party_monitoring.md) +- [Third-party monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/third-party_monitoring/) -- [Proactive monitoring using alerts](alerts.md) +- [Proactive monitoring using alerts](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/alerts/) -- [Best practices for monitoring](best_practices.md) +- [Best practices for monitoring](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/best_practices/) -- [Observability](observability.md) +- [Observability](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/observability/) - - [Logs](observability.md#logs) - - [Tracing](observability.md#tracing) + - [Logs](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/observability/#logs) + - [Tracing](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/bservability/#tracing) -[Conclusion](conclusion.md) +[Conclusion](https://linkedin.github.io/school-of-sre/metrics_and_monitoring/conclusion/) ## @@ -221,7 +221,8 @@ Before we discuss monitoring an application, let us look at the monitoring infrastructure. Following is an illustration of a basic monitoring system. -![Illustration of a monitoring infrastructure](images/image1.jpg)

Figure 1: Illustration of a monitoring infrastructure

+![Illustration of a monitoring infrastructure](images/image1.jpg) +

Figure 1: Illustration of a monitoring infrastructure

Figure 1 shows a monitoring infrastructure mechanism for aggregating metrics on the system, and collecting and storing the data for display.