mirror of
https://github.com/linkedin/school-of-sre
synced 2026-01-08 01:28:03 +00:00
docs (level 101): fix typos, punctuation, formatting (#160)
* docs: formatted for readability * docs: rephrased and added punctuation * docs: fix typos, punctuation, formatting * docs: fix typo and format * docs: fix caps and formatting * docs: fix punctuation and formatting * docs: capitalized SQL commands, fixed puntuation, formatting * docs: fix punctuation * docs: fix punctuation and formatting * docs: fix caps,punctuation and formatting * docs: fix links, punctuation, formatting * docs: fix code block formatting * docs: fix punctuation, indentation and formatting
This commit is contained in:
@@ -1,10 +1,10 @@
|
||||
# HTTP
|
||||
|
||||
Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above.
|
||||
Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT)
|
||||
Till this point we have only got the IP address of [linkedin.com](https://www.linkedin.com/). The HTML page of [linkedin.com](https://www.linkedin.com/) is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above.
|
||||
Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key-value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT).
|
||||
|
||||
```bash
|
||||
# Eg run the following in your container and have a look at the headers
|
||||
# Eg. run the following in your container and have a look at the headers
|
||||
curl linkedin.com -v
|
||||
```
|
||||
```bash
|
||||
@@ -25,7 +25,7 @@ curl linkedin.com -v
|
||||
* Closing connection 0
|
||||
```
|
||||
|
||||
Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, [Status Code and Status message](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors.
|
||||
Here, in the first line `GET` is the verb, `/` is the path and `1.1` is the HTTP protocol version. Then there are key-value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, [Status Code and Status message](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). Status codes `2xx` means success, `3xx` denotes redirection, `4xx` denotes client-side errors and `5xx` server-side errors.
|
||||
|
||||
We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1.
|
||||
|
||||
@@ -39,15 +39,14 @@ USER-AGENT: curl
|
||||
|
||||
```
|
||||
|
||||
This would get server response and waits for next input as the underlying connection to [www.linkedin.com](https://www.linkedin.com/) can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0, this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course.
|
||||
|
||||
This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course.
|
||||
HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to [linkedin.com](https://www.linkedin.com/), each request to [linkedin.com](https://www.linkedin.com/) from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for [linkedin.com](https://www.linkedin.com/). More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man-in-the-middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS, a spoofed IP of [linkedin.com](https://www.linkedin.com/) can cause a phishing attack on users where an user can give LinkedIn’s password to login on the malicious site. To solve both problems, HTTPS came in place and HTTPS has to be mandated.
|
||||
|
||||
HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin’s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated.
|
||||
|
||||
HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric.
|
||||
HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private-public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric.
|
||||
|
||||
```bash
|
||||
#Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date
|
||||
# Try the following on your terminal to see the cert details like Subject Name (domain name), Issuer details, Expiry date
|
||||
curl https://www.linkedin.com -v
|
||||
```
|
||||
```bash
|
||||
@@ -124,6 +123,6 @@ date: Mon, 09 Nov 2020 10:50:10 GMT
|
||||
* Closing connection 0
|
||||
```
|
||||
|
||||
Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.
|
||||
Here, my system has a list of certificate authorities it trusts in this file `/etc/ssl/cert.pem`. cURL validates the certificate is for [www.linkedin.com](https://www.linkedin.com/) by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in `/etc/ssl/cert.pem`. Once this is done, using the public key of [www.linkedin.com](https://www.linkedin.com/) it negotiates cipher `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384` with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user