docs (level 101): fix typos, punctuation, formatting (#160)

* docs: formatted for readability

* docs: rephrased and added punctuation

* docs: fix typos, punctuation, formatting

* docs: fix typo and format

* docs: fix caps and formatting

* docs: fix punctuation and formatting

* docs: capitalized SQL commands, fixed puntuation, formatting

* docs: fix punctuation

* docs: fix punctuation and formatting

* docs: fix caps,punctuation and formatting

* docs: fix links, punctuation, formatting

* docs: fix code block formatting

* docs: fix punctuation, indentation and formatting
This commit is contained in:
Jana R
2024-07-28 17:38:19 +05:30
committed by GitHub
parent bdcc6856ed
commit 4239ecf473
58 changed files with 1522 additions and 1367 deletions

View File

@@ -1,11 +1,11 @@
# Conclusion
With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course.
With this, we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course.
During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE.
# Post Training Exercises
1. Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost
2. Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CAs public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning
3. Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)
1. Set up your own DNS resolver in the `dev` environment which acts as an authoritative DNS server for `example.com` and forwarder for other domains. Update `resolv.conf` to use the new DNS resolver running in `localhost`.
2. Set up a site `dummy.example.com` in `localhost` and run a webserver with a self-signed certificate. Update the trusted CAs or pass self-signed CAs public key as a parameter so that `curl https://dummy.example.com -v` works properly without self-signed cert warning.
3. Update the routing table to use another host (container/VM) in the same network as a gateway for `8.8.8.8/32` and run `ping 8.8.8.8`. Do the packet capture on the new gateway to see L3 hop is working as expected (might need to disable `icmp_redirect`).

View File

@@ -1,14 +1,14 @@
# DNS
Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open [www.linkedin.com](https://www.linkedin.com) in the browser, the browser tries to convert [www.linkedin.com](https://www.linkedin.com) to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this
Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open [www.linkedin.com](https://www.linkedin.com) in the browser, the browser tries to convert [www.linkedin.com](https://www.linkedin.com) to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this:
```python
ip, err = getIPAddress(domainName)
if err:
print(unknown Host Exception while trying to resolve:%s.format(domainName))
print("unknown Host Exception while trying to resolve:%s".format(domainName))
```
Now lets try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName
Now, lets try to understand what happens inside the `getIPAddress` function. The browser would have a DNS cache of its own where it checks if there is a mapping for the `domainName` to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls `gethostbyname` syscall to ask the operating system to find the IP address for the given `domainName`.
```python
def getIPAddress(domainName):
@@ -23,15 +23,15 @@ def getIPAddress(domainName):
return resp
```
Now lets understand what operating system kernel does when the [gethostbyname](https://man7.org/linux/man-pages/man3/gethostbyname.3.html) function is called. The Linux operating system looks at the file [/etc/nsswitch.conf](https://man7.org/linux/man-pages/man5/nsswitch.conf.5.html) file which usually has a line
Now, let's understand what operating system kernel does when the [gethostbyname](https://man7.org/linux/man-pages/man3/gethostbyname.3.html) function is called. The Linux operating system looks at the file [/etc/nsswitch.conf](https://man7.org/linux/man-pages/man5/nsswitch.conf.5.html) file which usually has a line.
```bash
hosts: files dns
```
This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts.
This line means the OS has to look up first in file (`/etc/hosts`) and then use DNS protocol to do the resolution if there is no match in `/etc/hosts`.
The file /etc/hosts is of format
The file `/etc/hosts` is of format:
IPAddress FQDN [FQDN].\*
@@ -40,13 +40,13 @@ IPAddress FQDN [FQDN].\*
::1 localhost.localdomain localhost
```
If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file
If a match exists for a domain in this file, then that IP address is returned by the OS. Let's add a line to this file:
```bash
127.0.0.1 test.linkedin.com
```
And then do ping test.linkedin.com
And then do ping [test.linkedin.com](https://test.linkedin.com/).
```bash
ping test.linkedin.com -n
@@ -60,13 +60,13 @@ PING test.linkedin.com (127.0.0.1) 56(84) bytes of data.
```
As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) or statically configured by an administrator.
As mentioned earlier, if no match exists in `/etc/hosts`, the OS tries to do a DNS resolution using the DNS protocol. The Linux system makes a DNS request to the first IP in `/etc/resolv.conf`. If there is no response, requests are sent to subsequent servers in `resolv.conf`. These servers in `resolv.conf` are called DNS resolvers. The DNS resolvers are populated by [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) or statically configured by an administrator.
[Dig](https://linux.die.net/man/1/dig) is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console.
```bash
#run this command in one shell to capture all DNS requests
# run this command in one shell to capture all DNS requests
sudo tcpdump -s 0 -A -i any port 53
#make a dig request from another shell
# make a dig request from another shell
dig linkedin.com
```
@@ -79,9 +79,9 @@ dig linkedin.com
..)........
```
The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10
The packet capture shows a request is made to `172.23.195.101:53` (this is the resolver in `/etc/resolv.conf`) for [linkedin.com](https://www.linkedin.com/) and a response is received from `172.23.195.101` with the IP address of [linkedin.com](https://www.linkedin.com/) `108.174.10.10`.
Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks “linkedin.com” to “.”, “com.” and “linkedin.com.” and starts DNS resolution from “.”. The “.” is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for "com.". The address of the TLD nameserver of “com.” is returned. Now the DNS resolution service contacts the TLD nameserver for “com.” to fetch the authoritative nameserver for “linkedin.com”. Once an authoritative nameserver of “linkedin.com” is known, the resolver contacts Linkedins nameserver to provide the IP address of “linkedin.com”. This whole process can be visualized by running the following -
Now, let's try to understand how DNS resolver tries to find the IP address of [linkedin.com](https://www.linkedin.com/). DNS resolver first looks at its cache. Since many devices in the network can query for the domain name [linkedin.com](https://www.linkedin.com/), the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks “linkedin.com” to “.”, “com.” and “linkedin.com.” and starts DNS resolution from “.”. The “.” is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain nameservers to find the right top-level domain (TLD) nameservers which could respond regarding details for "com.". The address of the TLD nameserver of “com.” is returned. Now the DNS resolution service contacts the TLD nameserver for “com.” to fetch the authoritative nameserver for “linkedin.com”. Once an authoritative nameserver of “linkedin.com” is known, the resolver contacts LinkedIns nameserver to provide the IP address of “linkedin.com”. This whole process can be visualized by running the following:
```bash
dig +trace linkedin.com
@@ -91,15 +91,15 @@ dig +trace linkedin.com
linkedin.com. 3600 IN A 108.174.10.10
```
This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone.
This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time-to-Live (TTL) which says how long the DNS response is valid in seconds. In this case, this mapping of [linkedin.com](https://www.linkedin.com/) is valid for 1 hour. This is how the resolvers and application (browser) maintain their cache. Any request for [linkedin.com](https://www.linkedin.com/) beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone.
The 4th field says the type of DNS response/request. Some of the various DNS query types are
A, AAAA, NS, TXT, PTR, MX and CNAME.
- A record returns IPV4 address of the domain name
- AAAA record returns the IPV6 address of the domain Name
- NS record returns the authoritative nameserver for the domain name
- CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.coms IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net.
- For the brevity we are not discussing other DNS record types, the RFC of each of these records are available [here](https://en.wikipedia.org/wiki/List_of_DNS_record_types).
- CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example [www.linkedin.com](https://www.linkedin.com)s IP address is the same as `2-01-2c3e-005a.cdx.cedexis.net`.
- For the brevity, we are not discussing other DNS record types, the RFC of each of these records are available [here](https://en.wikipedia.org/wiki/List_of_DNS_record_types).
```bash
dig A linkedin.com +short
@@ -124,16 +124,16 @@ dig www.linkedin.com CNAME +short
2-01-2c3e-005a.cdx.cedexis.net.
```
Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs.
Armed with these fundamentals of DNS lets see use cases where DNS is used by SREs.
## Applications in SRE role
This section covers some of the common solutions SRE can derive from DNS
This section covers some of the common solutions SRE can derive from DNS.
1. Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesnt become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects.
2. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery([example](https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/service-discovery.html#dns-based-service-discovery))
3. DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute.
1. Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like Wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesnt become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects.
2. DNS can also be used for discovering services. For example the hostname `serviceb.internal.example.com` could list instances which run service `b` internally in `example.com` company. Cloud providers provide options to enable DNS discovery ([example](https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/service-discovery.html#dns-based-service-discovery)).
3. DNS is used by cloud providers and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short-lived like 1 minute.
4. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed.
5. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses.
6. Stale DNS cache can be a problem. Some [apps](https://stackoverflow.com/questions/1256556/how-to-make-java-honor-the-dns-caching-timeout) might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance.
5. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS (dealt later). DNSSEC protects from forged or manipulated DNS responses.
6. Stale DNS cache can be a problem. Some [apps](https://stackoverflow.com/questions/1256556/how-to-make-java-honor-the-dns-caching-timeout) might still be using expired DNS records for their API calls. This is something SRE has to be wary of when doing maintenance.
7. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.

View File

@@ -1,10 +1,10 @@
# HTTP
Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above.
Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT)
Till this point we have only got the IP address of [linkedin.com](https://www.linkedin.com/). The HTML page of [linkedin.com](https://www.linkedin.com/) is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above.
Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key-value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT).
```bash
# Eg run the following in your container and have a look at the headers
# Eg. run the following in your container and have a look at the headers
curl linkedin.com -v
```
```bash
@@ -25,7 +25,7 @@ curl linkedin.com -v
* Closing connection 0
```
Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, [Status Code and Status message](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors.
Here, in the first line `GET` is the verb, `/` is the path and `1.1` is the HTTP protocol version. Then there are key-value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, [Status Code and Status message](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). Status codes `2xx` means success, `3xx` denotes redirection, `4xx` denotes client-side errors and `5xx` server-side errors.
We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1.
@@ -39,15 +39,14 @@ USER-AGENT: curl
```
This would get server response and waits for next input as the underlying connection to [www.linkedin.com](https://www.linkedin.com/) can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0, this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course.
This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course.
HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to [linkedin.com](https://www.linkedin.com/), each request to [linkedin.com](https://www.linkedin.com/) from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for [linkedin.com](https://www.linkedin.com/). More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man-in-the-middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS, a spoofed IP of [linkedin.com](https://www.linkedin.com/) can cause a phishing attack on users where an user can give LinkedIns password to login on the malicious site. To solve both problems, HTTPS came in place and HTTPS has to be mandated.
HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedins password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated.
HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric.
HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private-public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric.
```bash
#Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date
# Try the following on your terminal to see the cert details like Subject Name (domain name), Issuer details, Expiry date
curl https://www.linkedin.com -v
```
```bash
@@ -124,6 +123,6 @@ date: Mon, 09 Nov 2020 10:50:10 GMT
* Closing connection 0
```
Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.
Here, my system has a list of certificate authorities it trusts in this file `/etc/ssl/cert.pem`. cURL validates the certificate is for [www.linkedin.com](https://www.linkedin.com/) by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in `/etc/ssl/cert.pem`. Once this is done, using the public key of [www.linkedin.com](https://www.linkedin.com/) it negotiates cipher `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384` with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.

View File

@@ -2,7 +2,7 @@
## Prerequisites
- High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP
- High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP
- [Linux Commandline Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/command_line_basics/)
## What to expect from this course
@@ -11,11 +11,11 @@ Throughout the course, we cover how an SRE can optimize the system to improve th
## What is not covered under this course
This course spends time on the fundamentals. We are not covering concepts like [HTTP/2.0](https://en.wikipedia.org/wiki/HTTP/2), [QUIC](https://en.wikipedia.org/wiki/QUIC), [TCP congestion control protocols](https://en.wikipedia.org/wiki/TCP_congestion_control), [Anycast](https://en.wikipedia.org/wiki/Anycast), [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), [Tunnels](https://en.wikipedia.org/wiki/Virtual_private_network) and [Multicast](https://en.wikipedia.org/wiki/Multicast). We expect that this course will provide the relevant basics to understand such concepts
This course spends time on the fundamentals. We are not covering concepts like [HTTP/2.0](https://en.wikipedia.org/wiki/HTTP/2), [QUIC](https://en.wikipedia.org/wiki/QUIC), [TCP congestion control protocols](https://en.wikipedia.org/wiki/TCP_congestion_control), [Anycast](https://en.wikipedia.org/wiki/Anycast), [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), [Tunnels](https://en.wikipedia.org/wiki/Virtual_private_network) and [Multicast](https://en.wikipedia.org/wiki/Multicast). We expect that this course will provide the relevant basics to understand such concepts.
## Birds eye view of the course
The course covers the question “What happens when you open linkedin.com in your browser?” The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol
The course covers the question “What happens when you open [linkedin.com](https://www.linkedin.com) in your browser?” The course follows the flow of TCP/IP stack. More specifically, the course covers topics of Application layer protocols (DNS and HTTP), transport layer protocols (UDP and TCP), networking layer protocol (IP) and data link layer protocol.
## Course Contents
1. [DNS](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/)

View File

@@ -1,8 +1,8 @@
# IP Routing and Data Link Layer
We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table.
We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP (discovered from DNS) and then looks up the route to the destination IP on the routing table.
```bash
#Linux route -n command gives the default routing table
# Linux `route -n` command gives the default routing table
route -n
```
@@ -13,20 +13,20 @@ Destination Gateway Genmask Flags Metric Ref Use Iface
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
```
Here the destination IP is bitwise ANDd with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.coms IP 108.174.10.10 is ANDd with 255.255.255.0 and the answer we get is 108.174.10.0 which doesnt match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row
Here, the destination IP is bitwise ANDd with the Genmask and if the answer is the destination part of the table, then that gateway and interface is picked for routing. Here, [linkedin.com](https://www.linkedin.com)s IP `108.174.10.10` is ANDd with `255.255.255.0` and the answer we get is `108.174.10.0` which doesnt match with any destination in the routing table. Then, Linux does an AND of destination IP with `0.0.0.0` and we get `0.0.0.0`. This answer matches the default row.
Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches.
At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0.
Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved.
Routing table is processed in the order of more octets of 1 set in Genmask and Genmask `0.0.0.0` is the default route if nothing matches.
At the end of this operation, Linux figured out that the packet has to be sent to next hop `172.17.0.1` via `eth0`. The source IP of the packet will be set as the IP of interface `eth0`.
Now, to send the packet to `172.17.0.1`, Linux has to figure out the MAC address of `172.17.0.1`. MAC address is figured by looking at the internal ARP cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has `172.17.0.1`. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source MAC address as MAC address of `eth0` and destination MAC address of `172.17.0.1` which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops, only till the IP/Network layer is involved.
![Screengrab for above explanation](images/arp.gif)
One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle
One weird gateway we saw in the routing table is `0.0.0.0`. This gateway means no Layer3 (Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the MAC of the destination and populate source and destination MAC appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle.
As we followed in other modules, lets complete this session with SRE usecases
As we followed in other modules, let's complete this session with SRE use cases.
## Applications in SRE role
1. Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary
2. Understanding error messages better like, “No route to host” error can mean mac address of the destination host is not found and it can mean the destination host is down
3. On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior
1. Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary.
2. Understanding error messages better like, “No route to host” error can mean MAC address of the destination host is not found and it can mean the destination host is down.
3. On rare cases, looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior.

View File

@@ -1,35 +1,35 @@
# TCP
TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control.
TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party
TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three-way handshake. In our case, the client sends a `SYN` packet along with the starting sequence number it plans to use, the server acknowledges the `SYN` packet and sends a `SYN` with its sequence number. Once the client acknowledges the `SYN` packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party.
![3-way handshake](images/established.png)
```bash
#To understand handshake run packet capture on one bash session
# To understand handshake run packet capture on one bash session
tcpdump -S -i any port 80
#Run curl on one bash session
# Run curl on one bash session
curl www.linkedin.com
```
![tcpdump-3way](images/pcap.png)
Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client its expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server wont send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability.
The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem
Here, client sends a `SYN` flag shown by [S] flag with a sequence number `1522264672`. The server acknowledges receipt of `SYN` with an `ACK` [.] flag and a `SYN` flag for its sequence number [S]. The server uses the sequence number `1063230400` and acknowledges the client it's expecting sequence number `1522264673` (client sequence + 1). Client sends a zero length acknowledgement packet to the server (server sequence + 1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1, this same connection can be reused which reduces overhead of three-way handshake for each HTTP request. If a packet is missed between client and server, server wont send an `ACK` to the client and client would retry sending the packet till the `ACK` is received. This guarantees reliability.
The flow control is established by the `WIN` size field in each segment. The `WIN` size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem.
TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here.
TCP also does congestion control which determines how many segments can be in transit without an `ACK`. Linux provides us the ability to configure algorithms for congestion control which we are not covering here.
While closing a connection, client/server calls a close syscall. Let's assume client do that. Clients kernel will send a FIN packet to the server. Servers kernel cant close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket cant be reused for that time period to prevent any TCP state corruptions due to stray stale packets.
While closing a connection, client/server calls a close syscall. Let's assume client do that. Clients kernel will send a `FIN` packet to the server. Servers kernel cant close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a `FIN` packet and client enters into `TIME_WAIT` state for 2*MSS (120s) so that this socket cant be reused for that time period to prevent any TCP state corruptions due to stray stale packets.
![Connection tearing](images/closed.png)
Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role
Armed with our TCP and HTTP knowledge, let's see how this is used by SREs in their role.
## Applications in SRE role
1. Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are [different kinds of load balancing](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236?gi=428394dbdcc3) like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs.
2. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver.
3. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall
4. Apps can run out of file descriptors if there are too many short lived connections. Digging through [tcp_reuse and tcp_recycle](http://lxr.linux.no/linux+v3.2.8/Documentation/networking/ip-sysctl.txt#L464) can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help
5. Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is
2. Tweaking `sysctl` variables for `rmem` and `wmem` like we did for UDP can improve throughput of sender and receiver.
3. `sysctl` variable `tcp_max_syn_backlog` and socket variable `somax_conn` determines how many connections for which the kernel can complete 3-way handshake before app calling accept syscall. This is much useful in single-threaded applications. Once the backlog is full, new connections stay in `SYN_RCVD` state (when you run `netstat`) till the application calls accept syscall.
4. Apps can run out of file descriptors if there are too many short-lived connections. Digging through [tcp_reuse and tcp_recycle](http://lxr.linux.no/linux+v3.2.8/Documentation/networking/ip-sysctl.txt#L464) can help reduce time spent in the `TIME_WAIT` state (it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help.
5. Understanding performance bottlenecks by seeing metrics and classifying whether it's a problem in App or network side. Example too many sockets in `CLOSE_WAIT` state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is.

View File

@@ -1,15 +1,15 @@
# UDP
UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any [ports](https://en.wikipedia.org/wiki/Port_(computer_networking)). DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via **sendto** system call. The kernel picks a random port number([>1024](https://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html)) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a **recvfrom** system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer.
UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP (most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client (eg `dig`) and DNS server (eg `named`). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any [ports](https://en.wikipedia.org/wiki/Port_(computer_networking)). DNS servers usually listen on port number `53`. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via **sendto** system call. The kernel picks a random port number ([>1024](https://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html)) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server-side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a **recvfrom** system call and reads the packet. This process by the kernel is called multiplexing (combining packets from multiple applications to same lower layers) and demultiplexing (segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer.
UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesnt do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application
UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So, it doesnt do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application.
This [example from python wiki](https://wiki.python.org/moin/UdpCommunication) covers a sample UDP client and server where “Hello World” is an application payload sent to server listening on port number 5005. The server receives the packet and prints the “Hello World” string from the client
This [example from python wiki](https://wiki.python.org/moin/UdpCommunication) covers a sample UDP client and server where “Hello World” is an application payload sent to server listening on port number `5005`. The server receives the packet and prints the “Hello World” string from the client.
## Applications in SRE role
1. If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using [sysctl variables](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-adjusting_network_settings-changing_network_kernel_settings) *net.core.wmem_max* and *net.core.wmem_default* provides some cushion to the application from the slow network
2. Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it cant queue due to the buffer being full. Since UDP doesnt guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables *rmem_default* and *rmem_max* can provide some cushion to slow applications from fast senders.
1. If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, `sendto` syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using [sysctl variables](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-adjusting_network_settings-changing_network_kernel_settings) *net.core.wmem_max* and *net.core.wmem_default* provides some cushion to the application from the slow network
2. Similarly, if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it cant queue due to the buffer being full. Since UDP doesnt guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables *rmem_default* and *rmem_max* can provide some cushion to slow applications from fast senders.