diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 120000 index 0000000..439ad26 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1 @@ +courses/CONTRIBUTING.md \ No newline at end of file diff --git a/NOTICE b/NOTICE new file mode 100644 index 0000000..d00a531 --- /dev/null +++ b/NOTICE @@ -0,0 +1,7 @@ +Copyright 2020 LinkedIn Corporation +All Rights Reserved. +Licensed under the BSD 2-Clause License (the "License"). +See LICENSE in the project root for license information. + +This product includes: +* N/A diff --git a/README.md b/README.md new file mode 120000 index 0000000..2406e9f --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +courses/index.md \ No newline at end of file diff --git a/courses/CONTRIBUTING.md b/courses/CONTRIBUTING.md new file mode 100644 index 0000000..52323d2 --- /dev/null +++ b/courses/CONTRIBUTING.md @@ -0,0 +1,5 @@ +We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents. + +As a contributor, you represent that the content you submit is not plagiarised. By submitting the content, you (and, if applicable, your employer) are licensing the submitted content to LinkedIn and the open source community subject to the BSD 2-Clause license. + +We suggest to open an issue first and seek advice for your changes before submitting a pull request. diff --git a/courses/big_data/intro.md b/courses/big_data/intro.md index ee2f1dc..c072233 100644 --- a/courses/big_data/intro.md +++ b/courses/big_data/intro.md @@ -1,38 +1,35 @@ -# School of SRE: Big Data +# Big Data -## Pre - Reads +## Prerequisites - Basics of Linux File systems. - Basic understanding of System Design. -## Target Audience - -The concept of Big Data has been around for years; most organizations now understand that if they capture all the data that streams into their businesses, they can apply analytics and get significant value from it. -This training material covers the basics of Big Data(using Hadoop) for beginners, who would like to quickly get started and get their hands dirty in this domain. - -## What to expect from this training +## What to expect from this course This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it. -## What is not covered under this training +## What is not covered under this course Writing programs to draw analytics from data. -## TOC: +## Course Content -1. Overview of Big Data -2. Usage of Big Data techniques -3. Evolution of Hadoop -4. Architecture of hadoop +### Table of Contents + +1. [Overview of Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/#overview-of-big-data) +2. [Usage of Big Data techniques](https://linkedin.github.io/school-of-sre/big_data/intro/#usage-of-big-data-techniques) +3. [Evolution of Hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/) +4. [Architecture of hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/#architecture-of-hadoop) 1. HDFS 2. Yarn -5. MapReduce framework -6. Other tooling around hadoop +5. [MapReduce framework](https://linkedin.github.io/school-of-sre/big_data/evolution/#mapreduce-framework) +6. [Other tooling around hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/#other-tooling-around-hadoop) 1. Hive 2. Pig 3. Spark 4. Presto -7. Data Serialisation and storage +7. [Data Serialisation and storage](https://linkedin.github.io/school-of-sre/big_data/evolution/#data-serialisation-and-storage) # Overview of Big Data @@ -50,7 +47,7 @@ Writing programs to draw analytics from data. 4. Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc. -# Usage of Big Data techniques +# Usage of Big Data Techniques 1. Take the example of the traffic lights problem. 1. There are more than 300,000 traffic lights in the US as of 2018. @@ -59,4 +56,5 @@ Writing programs to draw analytics from data. 4. How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day? 2. Consider the next example on Unified Payments Interface (UPI) transactions: 1. We had about 1.15 billion UPI transactions in the month of October, 2019 in India. - 12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that? \ No newline at end of file + 12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that? + \ No newline at end of file diff --git a/courses/databases_nosql/further_reading.md b/courses/databases_nosql/further_reading.md new file mode 100644 index 0000000..fa3255c --- /dev/null +++ b/courses/databases_nosql/further_reading.md @@ -0,0 +1,25 @@ +# Further reading: + +NoSQL: + +https://hostingdata.co.uk/nosql-database/ + +https://www.mongodb.com/nosql-explained + +https://www.mongodb.com/nosql-explained/nosql-vs-sql + +Cap Theorem + +http://www.julianbrowne.com/article/brewers-cap-theorem + +Scalability + +http://www.slideshare.net/jboner/scalability-availability-stability-patterns + +Eventual Consistency + +https://www.allthingsdistributed.com/2008/12/eventually_consistent.html + +https://www.toptal.com/big-data/consistent-hashing + +https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf diff --git a/courses/databases_nosql/images/Quorum.png b/courses/databases_nosql/images/Quorum.png new file mode 100644 index 0000000..8e79ec5 Binary files /dev/null and b/courses/databases_nosql/images/Quorum.png differ diff --git a/courses/databases_nosql/images/cluster_quorum.png b/courses/databases_nosql/images/cluster_quorum.png new file mode 100644 index 0000000..1091fb4 Binary files /dev/null and b/courses/databases_nosql/images/cluster_quorum.png differ diff --git a/courses/databases_nosql/images/consistent_hashing.png b/courses/databases_nosql/images/consistent_hashing.png new file mode 100644 index 0000000..31564bc Binary files /dev/null and b/courses/databases_nosql/images/consistent_hashing.png differ diff --git a/courses/databases_nosql/images/database_sharding.png b/courses/databases_nosql/images/database_sharding.png new file mode 100644 index 0000000..b3f83db Binary files /dev/null and b/courses/databases_nosql/images/database_sharding.png differ diff --git a/courses/databases_nosql/images/vector_clocks.png b/courses/databases_nosql/images/vector_clocks.png new file mode 100644 index 0000000..c4e9361 Binary files /dev/null and b/courses/databases_nosql/images/vector_clocks.png differ diff --git a/courses/databases_nosql/intro.md b/courses/databases_nosql/intro.md new file mode 100644 index 0000000..9989365 --- /dev/null +++ b/courses/databases_nosql/intro.md @@ -0,0 +1,222 @@ +# DATABASES - NoSQL + + +## Target Audience + +This Module is meant to be an introduction to NoSQL Databases. We will be touching upon the key concepts and trade offs in a distributed data system. + + +## What to expect from this training + +At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL. + + +## What is not covered under this training + +We will not be deep diving into any specific NoSQL Database. + + +## Agenda + + + +* Introduction to NoSQL +* CAP Theorem +* Data versioning +* Partitioning +* Hashing +* Quorum + + +## Introduction + +When people use the term “NoSQL database”, they typically use it to refer to any non-relational database. Some say the term “NoSQL” stands for “non SQL” while others say it stands for “not only SQL.” Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables. + +A common misconception is that NoSQL databases or non-relational databases don’t store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be _easier_, because related data doesn’t have to be split between tables. + +Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier. + + +### Types of NoSQL databases: + +Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are + + + +1. **Document databases: **They store data in documents similar to [JSON](https://www.json.org/json-en.html) (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase +2. **Key-Value databases:** These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don’t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: [Redis](https://redis.io/), [DynamoDB](https://aws.amazon.com/dynamodb/), [Voldemort](https://www.project-voldemort.com/voldemort/)/[Venice](https://engineering.linkedin.com/blog/2017/04/building-venice--a-production-software-case-study) (Linkedin), +3. **Wide-Column stores:** They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. [Cassandra](https://cassandra.apache.org/) and [HBase](https://hbase.apache.org/) are two of the most popular wide-column stores. +4. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: [Neo4j](https://neo4j.com/) + + +### **Comparison** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Performance + Scalability + Flexibility + Complexity + Functionality +
Key Value + high + high + high + none + Variable +
Document stores + high + Variable (high) + high + low + Variable (low) +
Column DB + high + high + moderate + low + minimal +
Graph + Variable + Variable + high + high + Graph theory +
+ + + +### Differences between SQL and NoSQL + +The table below summarizes the main differences between SQL and NoSQL databases. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ SQL Databases + NoSQL Databases +
Data Storage Model + Tables with fixed rows and columns + Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges +
Primary Purpose + General purpose + Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data +
Schemas + Rigid + Flexible +
Scaling + Vertical (scale-up with a larger server) + Horizontal (scale-out across commodity servers) +
Multi-Record ACID Transactions + Supported + Most do not support multi-record ACID transactions. However, some—like MongoDB—do. +
Joins + Typically required + Typically not required +
Data to Object Mapping + Requires ORM (object-relational mapping) + Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages. +
+ + + +### Advantages + + + +* Flexible Data Models + + Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead. + +* Horizontal Scaling + + Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems. + +* Fast Queries + + NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses. + +* Developer productivity + + NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs. \ No newline at end of file diff --git a/courses/databases_nosql/key_concepts.md b/courses/databases_nosql/key_concepts.md new file mode 100644 index 0000000..46723d4 --- /dev/null +++ b/courses/databases_nosql/key_concepts.md @@ -0,0 +1,274 @@ +## Key Concepts + +Lets looks at some of the key concepts when we talk about NoSQL or distributed systems + + +### CAP Theorem + + + +In a keynote titled “[Towards Robust Distributed Systems](https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/Brewer_podc_keynote_2000.pdf)” at ACM’s PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for **C**onsistency, **A**vailability & **P**artition Tolerance. + + + +* **Consistency** + + It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency. + +* **Availability** + + It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade. + +* **Partition Tolerance** + + It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems can’t talk to each other across the islands temporarily or permanently. + + +Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency. + +**Eventual Consistency **means that all readers will see writes, as time goes on: “In a steady state, the system will eventually return the last written value”. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version. + +NoSQL systems support different levels of eventual consistency models. For example: + + + +* Read Your Own Writes Consistency + + A client will see his updates immediately after they are written. The reads can hit nodes other than the one where it was written. However he might not see updates by other clients immediately. + +* Session Consistency: + + A client will see the updates to his data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates. + +* Casual Consistency + + A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders + + + + +Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients. + +Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas. + +CAP alternatives illustration + + + + + + + + + + + + + + + + + + + + + + + +
Choice + Traits + Examples +
Consistency + Availability +

+(Forfeit Partitions) +

2-phase commits +

+Cache invalidation protocols +

Single-site databases Cluster databases +

+LDAP +

+xFS file system +

Consistency + Partition tolerance +

+ (Forfeit Availability) +

Pessimistic locking +

+Make minority partitions unavailable +

Distributed databases Distributed locking Majority protocols +
Availability + Partition tolerance (Forfeit Consistency) + expirations/leases +

+conflict resolution optimistic +

DNS +

+Web caching +

+ + +### Versioning of Data in distributed systems + +When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are + + + +* **Timestamps** + + This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations. + +* **Optimistic Locking** + + You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions. + +* **Vector Clocks** + + A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no. + +

+ + +![alt_text](images/vector_clocks.png "Vector Clocks") + + + + +

Vector clocks illustration

+ +Vector clocks have the following advantages over other conflict resolution mechanism + + + +1. No dependency on synchronized clocks +2. No total ordering of revision nos required for casual reasoning + +No need to store and maintain multiple versions of the data on different nodes.** ** + + +### Partitioning + +When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take. + + + +1. **Memory cached** + + These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DB’s. A very common example is memcached or couchbase. + +2. **Clustering** + + Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent. + +3. **Separating reads from writes** + + In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag can’t be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model won’t make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up. + +4. **Sharding** + + Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards. + +

+ + +![alt_text]( images/database_sharding.png "Sharding") + + +

Sharding example

+ + +### Hashing + +A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as _hash code_, or simply _hash_. In a partitioned database, it is important to consistently map a key to a server/replica. + +For ex: you can use a very simple hash as a modulo function. + + + _p = k mod n_ + +Where + + + p -> partition, + + + k -> primary key + + + n -> no of nodes + +The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesn’t have the data needed to serve it. This brings us to consistent hashing. + + +#### Consistent Hashing + +Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed _hash table_ by assigning them a position on an abstract circle, or _hash ring_. This allows servers and objects to scale without affecting the overall system. + +Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array. + +

+ + +![alt_text]( images/consistent_hashing.png "Consistent Hashing") + + +

Consistent hashing illustration

+ +In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4. + +To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11…S19, S20 S21…S29 and S30 S31…S39. The factor for a number of replicas is also known as _weight_, depending on the situation. + + + + +All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si. + +Suppose server S3 is removed, then all S3 replicas with labels S30 S31 … S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved. + +Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 … S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5. + +When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced. + +Introducing replicas in a partitioning scheme—besides reliability benefits—also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesn’t work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing. + + + + +### Quorum + +Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running. + + + + + +To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex, + +In a 3 node cluster, you need 2 nodes for a majority, + +In a 5 node cluster, you need 3 nodes for a majority, + +In a 6 node cluster, you need 4 nodes for a majority. + +

+ + +![alt_text](images/Quorum.png "image_tooltip") + +

Quorum example

+ + + +Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning. + +Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster. + +Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run. + +Below diagram demonstrates Quorum selection on a cluster partitioned into two sets. + +

+ + +![alt_text](images/cluster_quorum.png "image_tooltip") + +**

Cluster Quorum example

** + diff --git a/courses/git/branches.md b/courses/git/branches.md index 92e07e8..856db58 100644 --- a/courses/git/branches.md +++ b/courses/git/branches.md @@ -83,7 +83,7 @@ Above tree structure should make things clear. Notice a clear branch/fork on com ## Merges -Now say the feature you were working on branch `b1` is complete. And you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you will pull the latest code from upstream (eg: GitHub). Then you need to merge your code from `b1` into master. And there could be two ways this can be done. +Now say the feature you were working on branch `b1` is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from `b1` into master. There could be two ways this can be done. Here is the current history: @@ -96,7 +96,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all * df2fb7a adding file 1 ``` -**Option 1: Directly merge the branch.** Merging the branch b1 into master will result in a new merge commit which will merge changes from two different lines of history and create a new commit of the result. +**Option 1: Directly merge the branch.** Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result. ```bash spatel1-mn1:school-of-sre spatel1$ git merge b1 diff --git a/courses/git/git-basics.md b/courses/git/git-basics.md index f89dc33..be8df86 100644 --- a/courses/git/git-basics.md +++ b/courses/git/git-basics.md @@ -1,6 +1,6 @@ -# School Of SRE: Git +# Git -## Pre - Reads +## Prerequisites 1. Have Git installed [https://git-scm.com/downloads](https://git-scm.com/downloads) 2. Have taken any git high level tutorial or following LinkedIn learning courses @@ -8,26 +8,26 @@ - [https://www.linkedin.com/learning/git-branches-merges-and-remotes/](https://www.linkedin.com/learning/git-branches-merges-and-remotes/) - [The Official Git Docs](https://git-scm.com/doc) -## What to expect from this training +## What to expect from this course -As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently! +As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently! -## What is not covered under this training +## What is not covered under this course Advanced usage and specifics of internal implementation details of Git. -## Training Content +## Course Content ### Table of Contents - 1. Git Basics - 2. Working with Branches - 3. Git with Github - 4. Hooks + 1. [Git Basics](https://linkedin.github.io/school-of-sre/git/git-basics/#git-basics) + 2. [Working with Branches](https://linkedin.github.io/school-of-sre/git/branches/) + 3. [Git with Github](https://linkedin.github.io/school-of-sre/git/github-hooks/#git-with-github) + 4. [Hooks](https://linkedin.github.io/school-of-sre/git/github-hooks/#hooks) ## Git Basics -Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains history of the changes happened with the codebase. +Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase. ### Creating a Git Repo @@ -92,7 +92,7 @@ spatel1-mn1:school-of-sre spatel1$ git commit -m "adding file 1" create mode 100644 file1.txt ``` -Notice how after adding the file, git status says `Changes to be commited:`. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via `-m`. +Notice how after adding the file, git status says `Changes to be committed:`. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via `-m`. ### More About a Commit diff --git a/courses/git/github-hooks.md b/courses/git/github-hooks.md index a16a9e2..db78e76 100644 --- a/courses/git/github-hooks.md +++ b/courses/git/github-hooks.md @@ -1,4 +1,4 @@ -## Git with Github +# Git with Github Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers. diff --git a/courses/img/favicon.ico b/courses/img/favicon.ico new file mode 100644 index 0000000..0090cbb Binary files /dev/null and b/courses/img/favicon.ico differ diff --git a/courses/img/sos.png b/courses/img/sos.png new file mode 100644 index 0000000..584c1b4 Binary files /dev/null and b/courses/img/sos.png differ diff --git a/courses/index.md b/courses/index.md index 5ac001b..28d24aa 100644 --- a/courses/index.md +++ b/courses/index.md @@ -1 +1,25 @@ -Hello, World!!! +# School of SRE +![School of SRE](img/sos.png) +Early 2019, we started visiting campuses to recruit the brightest minds to ensure LinkedIn and all the services that it is composed of is always available for everyone. This function at Linkedin falls in the purview of the Site Reliability Engineering team and Site Reliability Engineers ( SRE ) who are Software Engineers who specialize in reliability. SREs apply the principles of computer science and engineering to the design and development of computer systems: generally, large distributed ones. + +As we continued on this journey we started getting a lot of questions from these campuses on what exactly site engineering roll entails? and, how could someone learn the skills and the disciplines involved to become a successful site engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as Interns or as full time engineers to become a part of the Site Engineering team, we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can on board new new graduate engineers to the site engineering team. + +There is a vast amount of resources scattered throughout the web on what are the roles and responsibilities of an SREs, how to monitor site health, handling incidents, maintain SLO/SLI etc. But there are very few resources out there guiding someone on what all basic skill sets one has to acquire as a beginner. Because of the lack of these resources we felt that individuals are having a tough time getting into open positions in the industry. We created School Of SRE as a starting point for anyone wanting to build their career in the role of SRE. + +In this course we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of the topics can play a bigger role in your day to day SRE life. Currently we are covering the following topics under the School Of SRE: + +- Fundamentals Series + - [Linux Basics](https://linkedin.github.io/school-of-sre/linux_basics/intro/) + - [Git](https://linkedin.github.io/school-of-sre/git/git-basics/) + - [Linux Networking](https://linkedin.github.io/school-of-sre/linux_networking/intro/) +- [Python and Web](https://linkedin.github.io/school-of-sre/python_web/intro/) +- Data + - Relational databases (MySQL) + - NoSQL concepts + - [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/) +- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/) +- [Security](https://linkedin.github.io/school-of-sre/security/intro/) + +We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added reference which could be a guide for further learning. Our hope is that by going through these modules we should be able build the essential skills required for a Site Reliability Engineer. + +At Linkedin we are using this curriculum for onboarding our non-traditional hires and new college grads to the SRE role. We had multiple rounds of successful onboarding experience with the new members and helped them to be productive in a very short period of time. This motivated us to opensource these contents for helping other organisations onboarding new engineers to the role and individuals to get into the role. We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents. diff --git a/courses/linux_basics/command_line_basics.md b/courses/linux_basics/command_line_basics.md new file mode 100644 index 0000000..33294b0 --- /dev/null +++ b/courses/linux_basics/command_line_basics.md @@ -0,0 +1,445 @@ +# Command Line Basics + +## Lab Environment Setup + +One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands. + +[REPL](https://repl.it/languages/bash) is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course. + +## What is a Command + +A command is a program that tells the operating system to perform +specific work. Programs are stored as files in linux. Therefore, a +command is also a file which is stored somewhere on the disk. + +Commands may also take additional arguments as input from the user. +These arguments are called command line arguments. Knowing how to use +the commands is important and there are many ways to get help in Linux, +especially for commands. Almost every command will have some form of +documentation, most commands will have a command-line argument -h or +\--help that will display a reasonable amount of documentation. But the +most popular documentation system in Linux is called man pages - short +for manual pages. + +Using \--help to show the documentation for ls command. + +![](images/linux/commands/image19.png) + +## File System Organization + +The linux file system has a hierarchical (or tree-like) structure with +its highest level directory called root ( denoted by / ). Directories +present inside the root directory stores file related to the system. +These directories in turn can either store system files or application +files or user related files. + +![](images/linux/commands/image17.png) + + bin | The executable program of most commonly used commands reside in bin directory + sbin | This directory contains programs used for system administration. + home | This directory contains user related files and directories. + lib | This directory contains all the library files + etc | This directory contains all the system configuration files + proc | This directory contains files related to the running processes on the system + dev | This directory contains files related to devices on the system + mnt | This directory contains files related to mounted devices on the system + tmp | This directory is used to store temporary files on the system + usr | This directory is used to store application programs on the system + +## Commands for Navigating the File System + +There are three basic commands which are used frequently to navigate the +file system: + +- ls + +- pwd + +- cd + +We will now try to understand what each command does and how to use +these commands. You should also practice the given examples on the +online bash shell. + +### pwd (print working directory) + +At any given moment of time, we will be standing in a certain directory. +To get the name of the directory in which we are standing, we can use +the pwd command in linux. + +![](images/linux/commands/image2.png) + +We will now use the cd command to move to a different directory and then +print the working directory. + +![](images/linux/commands/image20.png) + +### cd (change directory) + +The cd command can be used to change the working directory. Using the +command, you can move from one directory to another. + +In the below example, we are initially in the root directory. we have +then used the cd command to change the directory. + +![](images/linux/commands/image3.png) + +### ls (list files and directories)** + +The ls command is used to list the contents of a directory. It will list +down all the files and folders present in the given directory. + +If we just type ls in the shell, it will list all the files and +directories present in the current directory. + +![](images/linux/commands/image7.png) + +We can also provide the directory name as argument to ls command. It +will then list all the files and directories inside the given directory. + +![](images/linux/commands/image4.png) + +## Commands for Manipulating Files + +There are four basic commands which are used frequently to manipulate +files: + +- touch + +- mkdir + +- cp + +- mv + +- rm + +We will now try to understand what each command does and how to use +these commands. You should also practice the given examples on the +online bash shell. + +### touch (create new file) + +The touch command can be used to create an empty new file. +This command is very useful for many other purposes but we will discuss +the simplest use case of creating a new file. + +General syntax of using touch command + +``` +touch +``` + +![](images/linux/commands/image9.png) + +### mkdir (create new directories) + +The mkdir command is used to create directories.You can use ls command +to verify that the new directory is created. + +General syntax of using mkdir command + +``` +mkdir +``` + +![](images/linux/commands/image11.png) + +### rm (delete files and directories) + +The rm command can be used to delete files and directories. It is very +important to note that this command permanently deletes the files and +directories. It's almost impossible to recover these files and +directories once you have executed rm command on them successfully. Do +run this command with care. + +General syntax of using rm command: + +``` +rm +``` + +Let's try to understand the rm command with an example. We will try to +delete the file and directory we created using touch and mkdir command +respectively. + +![](images/linux/commands/image18.png) + +### cp (copy files and directories) + +The cp command is used to copy files and directories from one location +to another. Do note that the cp command doesn't do any change to the +original files or directories. The original files or directories and +their copy both co-exist after running cp command successfully. + +General syntax of using cp command: + +``` +cp +``` + +We are currently in the '/home/runner' directory. We will use the mkdir +command to create a new directory named "test_directory". We will now +try to copy the "\_test_runner.py" file to the directory we created just +now. + +![](images/linux/commands/image23.png) + +Do note that nothing happened to the original "\_test_runner.py" file. +It's still there in the current directory. A new copy of it got created +inside the "test_directory". + +![](images/linux/commands/image14.png) + +We can also use the cp command to copy the whole directory from one +location to another. Let's try to understand this with an example. + +![](images/linux/commands/image12.png) + +We again used the mkdir command to create a new directory called +"another_directory". We then used the cp command along with an +additional argument '-r' to copy the "test_directory". + +**mv (move files and directories)** + +The mv command can either be used to move files or directories from one +location to another or it can be used to rename files or directories. Do +note that moving files and copying them are very different. When you +move the files or directories, the original copy is lost. + +General syntax of using mv command: + +``` +mv +``` + +In this example, we will use the mv command to move the +"\_test_runner.py" file to "test_directory". In this case, this file +already exists in "test_directory". The mv command will just replace it. +**Do note that the original file doesn't exist in the current directory +after mv command ran successfully.** + +![](images/linux/commands/image26.png) + +We can also use the mv command to move a directory from one location to +another. In this case, we do not need to use the '-r' flag that we did +while using the cp command. Do note that the original directory will not +exist if we use mv command. + +One of the important uses of the mv command is to rename files and +directories. Let's see how we can use this command for renaming. + +We have first changed our location to "test_directory". We then use the +mv command to rename the ""\_test_runner.py" file to "test.py". + +![](images/linux/commands/image29.png) + +## Commands for Viewing Files + +There are three basic commands which are used frequently to view the +files: + +- cat + +- head + +- tail + +We will now try to understand what each command does and how to use +these commands. You should also practice the given examples on the +online bash shell. + +We will create a new file called "numbers.txt" and insert numbers from 1 +to 100 in this file. Each number will be in a separate line. + +![](images/linux/commands/image21.png) + +Do not worry about the above command now. It's an advanced command which +is used to generate numbers. We have then used a redirection operator to +push these numbers to the file. We will be discussing I/O redirection in the +later sections. + + +### cat + +The most simplest use of cat command is to print the contents of the file on +your output screen. This command is very useful and can be used for many +other purposes. We will study about other use cases later. + +![](images/linux/commands/image1.png) + +You can try to run the above command and you will see numbers being +printed from 1 to 100 on your screen. You will need to scroll up to view +all the numbers. + +### head + +The head command displays the first 10 lines of the file by default. We +can include additional arguments to display as many lines as we want +from the top. + +In this example, we are only able to see the first 10 lines from the +file when we use the head command. + +![](images/linux/commands/image15.png) + +By default, head command will only display the first 10 lines. If we +want to specify the number of lines we want to see from start, use the +'-n' argument to provide the input. + +![](images/linux/commands/image16.png) + +### tail + +The tail command displays the last 10 lines of the file by default. We +can include additional arguments to display as many lines as we want +from the end of the file. + +![](images/linux/commands/image22.png) + +By default, the tail command will only display the last 10 lines. If we +want to specify the number of lines we want to see from the end, use '-n' +argument to provide the input. + +![](images/linux/commands/image10.png) + +In this example, we are only able to see the last 5 lines from the file +when we use the tail command with explicit -n option. + + +## Echo Command in Linux + +The echo command is one of the simplest commands that is used in the +shell. This command is equivalent to what we have in other +programming languages. + +The echo command prints the given input string on the screen. + +![](images/linux/commands/image24.png) + +## Text Processing Commands + +In the previous section, we learned how to view the content of a file. +In many cases, we will be interested in performing the below operations: + +- Print only the lines which contain a particular word(s) + +- Replace a particular word with another word in a file + +- Sort the lines in a particular order + +There are three basic commands which are used frequently to process +texts: + +- grep + +- sed + +- sort + +We will now try to understand what each command does and how to use +these commands. You should also practice the given examples on the +online bash shell. + +We will create a new file called "numbers.txt" and insert numbers from 1 +to 10 in this file. Each number will be in a separate line. + +![](images/linux/commands/image8.png) + +### grep + +The grep command in its simplest form can be used to search particular +words in a text file. It will display all the lines in a file that +contains a particular input. The word we want to search is provided as +an input to the grep command. + +General syntax of using grep command: + +``` +grep +``` + +In this example, we are trying to search for a string "1" in this file. +The grep command outputs the lines where it found this string. + +![](images/linux/commands/image5.png) + +### sed + +The sed command in its simplest form can be used to replace a text in a +file. + +General syntax of using the sed command for replacement: + +``` +sed 's///' +``` + +Let's try to replace each occurrence of "1" in the file with "3" using +sed command. + +![](images/linux/commands/image31.png) + +The content of the file will not change in the above +example. To do so, we have to use an extra argument '-i' so that the +changes are reflected back in the file. + +### sort + +The sort command can be used to sort the input provided to it as an +argument. By default, it will sort in increasing order. + +Let's first see the content of the file before trying to sort it. + +![](images/linux/commands/image27.png) + +Now, we will try to sort the file using the sort command. The sort +command sorts the content in lexicographical order. + +![](images/linux/commands/image32.png) + +The content of the file will not change in the above +example. + +## I/O Redirection + +Each open file gets assigned a file descriptor. A file descriptor is an +unique identifier for open files in the system. There are always three +default files open, stdin (the keyboard), stdout (the screen), and +stderr (error messages output to the screen). These files can be +redirected. + +Everything is a file in linux - +[https://unix.stackexchange.com/questions/225537/everything-is-a-file](https://unix.stackexchange.com/questions/225537/everything-is-a-file) + +Till now, we have displayed all the output on the screen which is the +standard output. We can use some special operators to redirect the +output of the command to files or even to the input of other commands. +I/O redirection is a very powerful feature. + +In the below example, we have used the '>' operator to redirect the +output of ls command to output.txt file. + +![](images/linux/commands/image30.png) + +In the below example, we have redirected the output from echo command to +a file. + +![](images/linux/commands/image13.png) + +We can also redirect the output of a command as an input to another +command. This is possible with the help of pipes. + +In the below example, we have passed the output of cat command as an +input to grep command using pipe(\|) operator. + +![](images/linux/commands/image6.png) + +In the below example, we have passed the output of sort command as an +input to uniq command using pipe(\|) operator. The uniq command only +prints the unique numbers from the input. + +![](images/linux/commands/image28.png) + +I/O redirection - +[https://tldp.org/LDP/abs/html/io-redirection.html](https://tldp.org/LDP/abs/html/io-redirection.html) diff --git a/courses/linux_basics/conclusion.md b/courses/linux_basics/conclusion.md new file mode 100644 index 0000000..783340f --- /dev/null +++ b/courses/linux_basics/conclusion.md @@ -0,0 +1,25 @@ +# Conclusion + +With this we have covered the basics of linux operating systems along with basic commands +which are used in linux. We have also covered the linux server administration commands. + +We hope that this course will make it easier for you to operate on the command line. + +## Applications in SRE Role + +1. As a SRE, you will be required to perform some general tasks on these linux servers. You will also be using the command line when you are troubleshooting issues. +2. Moving from one location to another in the filesystem will require the help of ls, pwd and cd commands +3. You may need to search some specific information in the log files. Grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. +4. Tail command is very useful to view the latest data in the log file. +5. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown, chmod and chgrp commands. +6. SSH is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. +7. What if we want to run an apache server or nginx on a server ? We will first install it using the package manager. Package management commands become important here. +8. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed. +9. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here. +10. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started. + +## Useful Courses and tutorials + +* [Edx basic linux commands course](https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/) +* [Edx Red Hat Enterprise Linux Course](https://courses.edx.org/courses/course-v1:RedHat+RH066x+2T2017/course/) +* [https://linuxcommand.org/lc3_learning_the_shell.php](https://linuxcommand.org/lc3_learning_the_shell.php) diff --git a/courses/linux_basics/images/linux/admin/image1.png b/courses/linux_basics/images/linux/admin/image1.png new file mode 100644 index 0000000..365ad09 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image1.png differ diff --git a/courses/linux_basics/images/linux/admin/image10.png b/courses/linux_basics/images/linux/admin/image10.png new file mode 100644 index 0000000..73d1a2a Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image10.png differ diff --git a/courses/linux_basics/images/linux/admin/image11.png b/courses/linux_basics/images/linux/admin/image11.png new file mode 100644 index 0000000..7710bdc Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image11.png differ diff --git a/courses/linux_basics/images/linux/admin/image12.png b/courses/linux_basics/images/linux/admin/image12.png new file mode 100644 index 0000000..74199df Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image12.png differ diff --git a/courses/linux_basics/images/linux/admin/image13.png b/courses/linux_basics/images/linux/admin/image13.png new file mode 100644 index 0000000..5044f3d Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image13.png differ diff --git a/courses/linux_basics/images/linux/admin/image14.png b/courses/linux_basics/images/linux/admin/image14.png new file mode 100644 index 0000000..5a0f468 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image14.png differ diff --git a/courses/linux_basics/images/linux/admin/image15.png b/courses/linux_basics/images/linux/admin/image15.png new file mode 100644 index 0000000..e0aa749 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image15.png differ diff --git a/courses/linux_basics/images/linux/admin/image16.png b/courses/linux_basics/images/linux/admin/image16.png new file mode 100644 index 0000000..947658d Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image16.png differ diff --git a/courses/linux_basics/images/linux/admin/image17.png b/courses/linux_basics/images/linux/admin/image17.png new file mode 100644 index 0000000..26d777a Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image17.png differ diff --git a/courses/linux_basics/images/linux/admin/image18.png b/courses/linux_basics/images/linux/admin/image18.png new file mode 100644 index 0000000..ac72d6d Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image18.png differ diff --git a/courses/linux_basics/images/linux/admin/image19.png b/courses/linux_basics/images/linux/admin/image19.png new file mode 100644 index 0000000..f782f43 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image19.png differ diff --git a/courses/linux_basics/images/linux/admin/image2.png b/courses/linux_basics/images/linux/admin/image2.png new file mode 100644 index 0000000..bf056a0 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image2.png differ diff --git a/courses/linux_basics/images/linux/admin/image20.png b/courses/linux_basics/images/linux/admin/image20.png new file mode 100644 index 0000000..eaf3dd4 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image20.png differ diff --git a/courses/linux_basics/images/linux/admin/image21.png b/courses/linux_basics/images/linux/admin/image21.png new file mode 100644 index 0000000..1eb0234 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image21.png differ diff --git a/courses/linux_basics/images/linux/admin/image22.png b/courses/linux_basics/images/linux/admin/image22.png new file mode 100644 index 0000000..bc77e51 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image22.png differ diff --git a/courses/linux_basics/images/linux/admin/image23.png b/courses/linux_basics/images/linux/admin/image23.png new file mode 100644 index 0000000..56345f0 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image23.png differ diff --git a/courses/linux_basics/images/linux/admin/image24.png b/courses/linux_basics/images/linux/admin/image24.png new file mode 100644 index 0000000..9b5d955 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image24.png differ diff --git a/courses/linux_basics/images/linux/admin/image25.png b/courses/linux_basics/images/linux/admin/image25.png new file mode 100644 index 0000000..50b894f Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image25.png differ diff --git a/courses/linux_basics/images/linux/admin/image26.png b/courses/linux_basics/images/linux/admin/image26.png new file mode 100644 index 0000000..6b561a8 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image26.png differ diff --git a/courses/linux_basics/images/linux/admin/image27.png b/courses/linux_basics/images/linux/admin/image27.png new file mode 100644 index 0000000..7eee5e2 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image27.png differ diff --git a/courses/linux_basics/images/linux/admin/image28.png b/courses/linux_basics/images/linux/admin/image28.png new file mode 100644 index 0000000..1e7a557 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image28.png differ diff --git a/courses/linux_basics/images/linux/admin/image29.png b/courses/linux_basics/images/linux/admin/image29.png new file mode 100644 index 0000000..2f8cdfe Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image29.png differ diff --git a/courses/linux_basics/images/linux/admin/image3.png b/courses/linux_basics/images/linux/admin/image3.png new file mode 100644 index 0000000..a4d5c3c Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image3.png differ diff --git a/courses/linux_basics/images/linux/admin/image30.png b/courses/linux_basics/images/linux/admin/image30.png new file mode 100644 index 0000000..9e9caa6 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image30.png differ diff --git a/courses/linux_basics/images/linux/admin/image31.jpg b/courses/linux_basics/images/linux/admin/image31.jpg new file mode 100644 index 0000000..b7d99c1 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image31.jpg differ diff --git a/courses/linux_basics/images/linux/admin/image32.png b/courses/linux_basics/images/linux/admin/image32.png new file mode 100644 index 0000000..99be765 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image32.png differ diff --git a/courses/linux_basics/images/linux/admin/image33.png b/courses/linux_basics/images/linux/admin/image33.png new file mode 100644 index 0000000..f56c4ed Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image33.png differ diff --git a/courses/linux_basics/images/linux/admin/image34.png b/courses/linux_basics/images/linux/admin/image34.png new file mode 100644 index 0000000..0357536 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image34.png differ diff --git a/courses/linux_basics/images/linux/admin/image35.png b/courses/linux_basics/images/linux/admin/image35.png new file mode 100644 index 0000000..f1000ea Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image35.png differ diff --git a/courses/linux_basics/images/linux/admin/image36.png b/courses/linux_basics/images/linux/admin/image36.png new file mode 100644 index 0000000..85b9a9d Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image36.png differ diff --git a/courses/linux_basics/images/linux/admin/image37.png b/courses/linux_basics/images/linux/admin/image37.png new file mode 100644 index 0000000..f153b21 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image37.png differ diff --git a/courses/linux_basics/images/linux/admin/image38.png b/courses/linux_basics/images/linux/admin/image38.png new file mode 100644 index 0000000..e59c478 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image38.png differ diff --git a/courses/linux_basics/images/linux/admin/image39.png b/courses/linux_basics/images/linux/admin/image39.png new file mode 100644 index 0000000..de54428 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image39.png differ diff --git a/courses/linux_basics/images/linux/admin/image4.png b/courses/linux_basics/images/linux/admin/image4.png new file mode 100644 index 0000000..6b58219 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image4.png differ diff --git a/courses/linux_basics/images/linux/admin/image40.png b/courses/linux_basics/images/linux/admin/image40.png new file mode 100644 index 0000000..de24aa5 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image40.png differ diff --git a/courses/linux_basics/images/linux/admin/image41.png b/courses/linux_basics/images/linux/admin/image41.png new file mode 100644 index 0000000..e94f7d3 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image41.png differ diff --git a/courses/linux_basics/images/linux/admin/image42.png b/courses/linux_basics/images/linux/admin/image42.png new file mode 100644 index 0000000..df8889d Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image42.png differ diff --git a/courses/linux_basics/images/linux/admin/image43.png b/courses/linux_basics/images/linux/admin/image43.png new file mode 100644 index 0000000..ac08e10 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image43.png differ diff --git a/courses/linux_basics/images/linux/admin/image44.png b/courses/linux_basics/images/linux/admin/image44.png new file mode 100644 index 0000000..aa9cd1f Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image44.png differ diff --git a/courses/linux_basics/images/linux/admin/image45.png b/courses/linux_basics/images/linux/admin/image45.png new file mode 100644 index 0000000..2ca25a2 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image45.png differ diff --git a/courses/linux_basics/images/linux/admin/image46.png b/courses/linux_basics/images/linux/admin/image46.png new file mode 100644 index 0000000..ec95a7b Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image46.png differ diff --git a/courses/linux_basics/images/linux/admin/image47.png b/courses/linux_basics/images/linux/admin/image47.png new file mode 100644 index 0000000..b032aa7 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image47.png differ diff --git a/courses/linux_basics/images/linux/admin/image48.png b/courses/linux_basics/images/linux/admin/image48.png new file mode 100644 index 0000000..b3b8e40 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image48.png differ diff --git a/courses/linux_basics/images/linux/admin/image49.png b/courses/linux_basics/images/linux/admin/image49.png new file mode 100644 index 0000000..526187f Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image49.png differ diff --git a/courses/linux_basics/images/linux/admin/image5.png b/courses/linux_basics/images/linux/admin/image5.png new file mode 100644 index 0000000..0bfcda2 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image5.png differ diff --git a/courses/linux_basics/images/linux/admin/image50.png b/courses/linux_basics/images/linux/admin/image50.png new file mode 100644 index 0000000..fc42ad9 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image50.png differ diff --git a/courses/linux_basics/images/linux/admin/image51.png b/courses/linux_basics/images/linux/admin/image51.png new file mode 100644 index 0000000..e30c4d2 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image51.png differ diff --git a/courses/linux_basics/images/linux/admin/image52.png b/courses/linux_basics/images/linux/admin/image52.png new file mode 100644 index 0000000..2a0d0b6 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image52.png differ diff --git a/courses/linux_basics/images/linux/admin/image53.png b/courses/linux_basics/images/linux/admin/image53.png new file mode 100644 index 0000000..ac3d18e Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image53.png differ diff --git a/courses/linux_basics/images/linux/admin/image54.png b/courses/linux_basics/images/linux/admin/image54.png new file mode 100644 index 0000000..c4c64b1 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image54.png differ diff --git a/courses/linux_basics/images/linux/admin/image55.png b/courses/linux_basics/images/linux/admin/image55.png new file mode 100644 index 0000000..ef88e35 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image55.png differ diff --git a/courses/linux_basics/images/linux/admin/image56.png b/courses/linux_basics/images/linux/admin/image56.png new file mode 100644 index 0000000..6413b24 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image56.png differ diff --git a/courses/linux_basics/images/linux/admin/image57.png b/courses/linux_basics/images/linux/admin/image57.png new file mode 100644 index 0000000..c8325d7 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image57.png differ diff --git a/courses/linux_basics/images/linux/admin/image58.png b/courses/linux_basics/images/linux/admin/image58.png new file mode 100644 index 0000000..d068a10 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image58.png differ diff --git a/courses/linux_basics/images/linux/admin/image6.png b/courses/linux_basics/images/linux/admin/image6.png new file mode 100644 index 0000000..a6b2851 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image6.png differ diff --git a/courses/linux_basics/images/linux/admin/image7.png b/courses/linux_basics/images/linux/admin/image7.png new file mode 100644 index 0000000..e43a493 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image7.png differ diff --git a/courses/linux_basics/images/linux/admin/image8.png b/courses/linux_basics/images/linux/admin/image8.png new file mode 100644 index 0000000..5aa9637 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image8.png differ diff --git a/courses/linux_basics/images/linux/admin/image9.png b/courses/linux_basics/images/linux/admin/image9.png new file mode 100644 index 0000000..ce8b0c2 Binary files /dev/null and b/courses/linux_basics/images/linux/admin/image9.png differ diff --git a/courses/linux_basics/images/linux/commands/image1.png b/courses/linux_basics/images/linux/commands/image1.png new file mode 100644 index 0000000..a88d782 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image1.png differ diff --git a/courses/linux_basics/images/linux/commands/image10.png b/courses/linux_basics/images/linux/commands/image10.png new file mode 100644 index 0000000..a62ea9c Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image10.png differ diff --git a/courses/linux_basics/images/linux/commands/image11.png b/courses/linux_basics/images/linux/commands/image11.png new file mode 100644 index 0000000..109ff4e Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image11.png differ diff --git a/courses/linux_basics/images/linux/commands/image12.png b/courses/linux_basics/images/linux/commands/image12.png new file mode 100644 index 0000000..59e2981 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image12.png differ diff --git a/courses/linux_basics/images/linux/commands/image13.png b/courses/linux_basics/images/linux/commands/image13.png new file mode 100644 index 0000000..1a73b0b Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image13.png differ diff --git a/courses/linux_basics/images/linux/commands/image14.png b/courses/linux_basics/images/linux/commands/image14.png new file mode 100644 index 0000000..120f049 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image14.png differ diff --git a/courses/linux_basics/images/linux/commands/image15.png b/courses/linux_basics/images/linux/commands/image15.png new file mode 100644 index 0000000..95a1ddd Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image15.png differ diff --git a/courses/linux_basics/images/linux/commands/image16.png b/courses/linux_basics/images/linux/commands/image16.png new file mode 100644 index 0000000..9ab4190 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image16.png differ diff --git a/courses/linux_basics/images/linux/commands/image17.png b/courses/linux_basics/images/linux/commands/image17.png new file mode 100644 index 0000000..987a586 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image17.png differ diff --git a/courses/linux_basics/images/linux/commands/image18.png b/courses/linux_basics/images/linux/commands/image18.png new file mode 100644 index 0000000..f06e5be Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image18.png differ diff --git a/courses/linux_basics/images/linux/commands/image19.png b/courses/linux_basics/images/linux/commands/image19.png new file mode 100644 index 0000000..e0761e3 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image19.png differ diff --git a/courses/linux_basics/images/linux/commands/image2.png b/courses/linux_basics/images/linux/commands/image2.png new file mode 100644 index 0000000..b199ffb Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image2.png differ diff --git a/courses/linux_basics/images/linux/commands/image20.png b/courses/linux_basics/images/linux/commands/image20.png new file mode 100644 index 0000000..9c04837 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image20.png differ diff --git a/courses/linux_basics/images/linux/commands/image21.png b/courses/linux_basics/images/linux/commands/image21.png new file mode 100644 index 0000000..56db7cd Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image21.png differ diff --git a/courses/linux_basics/images/linux/commands/image22.png b/courses/linux_basics/images/linux/commands/image22.png new file mode 100644 index 0000000..e2e96c9 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image22.png differ diff --git a/courses/linux_basics/images/linux/commands/image23.png b/courses/linux_basics/images/linux/commands/image23.png new file mode 100644 index 0000000..2fbbaad Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image23.png differ diff --git a/courses/linux_basics/images/linux/commands/image24.png b/courses/linux_basics/images/linux/commands/image24.png new file mode 100644 index 0000000..1b15077 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image24.png differ diff --git a/courses/linux_basics/images/linux/commands/image25.png b/courses/linux_basics/images/linux/commands/image25.png new file mode 100644 index 0000000..39ce92c Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image25.png differ diff --git a/courses/linux_basics/images/linux/commands/image26.png b/courses/linux_basics/images/linux/commands/image26.png new file mode 100644 index 0000000..559c13b Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image26.png differ diff --git a/courses/linux_basics/images/linux/commands/image27.png b/courses/linux_basics/images/linux/commands/image27.png new file mode 100644 index 0000000..9004989 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image27.png differ diff --git a/courses/linux_basics/images/linux/commands/image28.png b/courses/linux_basics/images/linux/commands/image28.png new file mode 100644 index 0000000..e11d3fd Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image28.png differ diff --git a/courses/linux_basics/images/linux/commands/image29.png b/courses/linux_basics/images/linux/commands/image29.png new file mode 100644 index 0000000..a628cad Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image29.png differ diff --git a/courses/linux_basics/images/linux/commands/image3.png b/courses/linux_basics/images/linux/commands/image3.png new file mode 100644 index 0000000..7a36c72 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image3.png differ diff --git a/courses/linux_basics/images/linux/commands/image30.png b/courses/linux_basics/images/linux/commands/image30.png new file mode 100644 index 0000000..b3ebf55 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image30.png differ diff --git a/courses/linux_basics/images/linux/commands/image31.png b/courses/linux_basics/images/linux/commands/image31.png new file mode 100644 index 0000000..daf7f54 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image31.png differ diff --git a/courses/linux_basics/images/linux/commands/image32.png b/courses/linux_basics/images/linux/commands/image32.png new file mode 100644 index 0000000..b6d6c30 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image32.png differ diff --git a/courses/linux_basics/images/linux/commands/image4.png b/courses/linux_basics/images/linux/commands/image4.png new file mode 100644 index 0000000..1cd900c Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image4.png differ diff --git a/courses/linux_basics/images/linux/commands/image5.png b/courses/linux_basics/images/linux/commands/image5.png new file mode 100644 index 0000000..1060bce Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image5.png differ diff --git a/courses/linux_basics/images/linux/commands/image6.png b/courses/linux_basics/images/linux/commands/image6.png new file mode 100644 index 0000000..8f19b15 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image6.png differ diff --git a/courses/linux_basics/images/linux/commands/image7.png b/courses/linux_basics/images/linux/commands/image7.png new file mode 100644 index 0000000..bcbac6a Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image7.png differ diff --git a/courses/linux_basics/images/linux/commands/image8.png b/courses/linux_basics/images/linux/commands/image8.png new file mode 100644 index 0000000..c8b8813 Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image8.png differ diff --git a/courses/linux_basics/images/linux/commands/image9.png b/courses/linux_basics/images/linux/commands/image9.png new file mode 100644 index 0000000..501229a Binary files /dev/null and b/courses/linux_basics/images/linux/commands/image9.png differ diff --git a/courses/linux_basics/intro.md b/courses/linux_basics/intro.md new file mode 100644 index 0000000..d34368c --- /dev/null +++ b/courses/linux_basics/intro.md @@ -0,0 +1,174 @@ +# Introduction + +## Prerequisites + +- Experience of working on any operating systems like Windows, Linux or Mac +- Basics of operating system + +## What to expect from this course + +This course is divided into three parts. In the first part, we will cover the +fundamentals of linux operating systems. We will talk about linux architecture, +linux distributions and uses of linux operating systems. We will also talk about +difference between GUI and CLI. + +In the second part, we will study about some of the basic commands that are used +in linux. We will focus on commands used for navigating file system, commands used +for manipulating files, commands used for viewing files, I/O redirection etc. + +In the third part, we will study about linux system administration. In this part, we +will focus on day to day tasks performed by linux admins like managing users/groups, +managing file permissions, monitoring system performance, log files etc. + +In the second and third part, we will be taking examples to understand the concepts. + +## What is not covered under this course + +We are not covering advanced linux commands and bash scripting in this +course. We will also not be covering linux internals. + +## Course Content + +The following topics has been covered in this course: + +- [Introduction to Linux](https://linkedin.github.io/school-of-sre/linux_basics/intro/) + - [What are Linux Operating Systems](https://linkedin.github.io/school-of-sre/linux_basics/intro/#what-are-linux-operating-systems) + - [What are popular Linux distributions](https://linkedin.github.io/school-of-sre/linux_basics/intro/#what-are-popular-linux-distributions) + - [Uses of Linux Operating Systems](https://linkedin.github.io/school-of-sre/linux_basics/intro/#uses-of-linux-operating-systems) + - [Linux Architecture](https://linkedin.github.io/school-of-sre/linux_basics/intro/#linux-architecture) + - [Graphical user interface (GUI) vs Command line interface (CLI)](https://linkedin.github.io/school-of-sre/linux_basics/intro/#graphical-user-interface-gui-vs-command-line-interface-cli) +- [Command Line Basics](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/) + - [Lab Environment Setup](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/lab-environment-setup) + - [What is a Command](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#what-is-a-command) + - [File System Organization](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#file-system-organization) + - [Navigating File System](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#commands-for-navigating-the-file-system) + - [Manipulating Files](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#commands-for-manipulating-files) + - [Viewing Files](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#commands-for-viewing-files) + - [Echo Command](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#echo-command) + - [Text Processing Commands](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#text-processing-commands) + - [I/O Redirection](https://linkedin.github.io/school-of-sre/linux_basics/command_line_basics/#io-redirection) +- [Linux system administration](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/) + - [Lab Environment Setup](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/lab-environment-setup) + - [User/Groups management](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#usergroup-management) + - [Becoming a Superuser](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#becoming-a-superuser) + - [File Permissions](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#file-permissions) + - [SSH Command](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#ssh-command) + - [Package Management](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#package-management) + - [Process Management](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#process-management) + - [Memory Management](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#memory-management) + - [Daemons and Systemd](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#daemons) + - [Logs](https://linkedin.github.io/school-of-sre/linux_basics/linux_server_administration/#logs) +- [Conclusion](https://linkedin.github.io/school-of-sre/linux_basics/conclusion) + - [Applications in SRE Role](https://linkedin.github.io/school-of-sre/linux_basics/conclusion/#applications-in-sre-role) + - [Useful Courses and tutorials](https://linkedin.github.io/school-of-sre/linux_basics/conclusion/#useful-courses-and-tutorials) + +## What are Linux operating systems + +Most of us will be familiar with the windows operating system which is +used in more than 75% of the personal computers. The windows operating systems +are based on windows NT kernel. A kernel is the most important part of +an operating system which performs important functions like process +management, memory management, filesystem management etc. + +Linux operating systems are based on the Linux kernel. A linux based +operating system will consist of linux kernel, GUI/CLI, system libraries +and system utilities. The Linux kernel was independently developed and +released by Linus Torvalds. The linux kernel is free and open-source - +[https://github.com/torvalds/linux](https://github.com/torvalds/linux) + +History of Linux - +[https://en.wikipedia.org/wiki/History_of_Linux](https://en.wikipedia.org/wiki/History_of_Linux) + +## What are popular Linux distributions + +A linux distribution(distro) is an operating system that is based on +the linux kernel and a package management system. A package management +system consists of tools that helps in installing, upgrading, +configuring and removing softwares on the operating system. + +Softwares are usually adopted to a distribution and are packaged in a +distro specific format. These packages are available through a distro +specific repository. Packages are installed and managed in the operating +system by a package manager. + +**List of popular Linux distributions:** + +- Fedora + +- Ubuntu + +- Debian + +- Centos + +- Red Hat Enterprise Linux + +- Suse + +- Arch Linux + + +| Packaging systems | Distributions | Package manager +| ---------------------- | ------------------------------------------ | ----------------- +| Debian style (.deb) | Debian, Ubuntu | APT +| Red Hat style (.rpm) | Fedora, CentOS, Red Hat Enterprise Linux | YUM + +## Linux Architecture + +![](images/linux/commands/image25.png) + +- The Linux kernel is monolithic in nature. + +- System calls are used to interact with the linux kernel space. + +- Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode. + +- Device drivers are used to communicate with the hardware devices. + +## Uses of Linux Operating Systems + +Operating system based on linux kernel are widely used in: + +- Personal computers + +- Servers + +- Mobile phones - Android is based on linux operating system + +- Embedded devices - watches, televisions, traffic lights etc + +- Satelites + +- Network devices - routers, switches etc. + +## Graphical user interface (GUI) vs Command line interface (CLI) + +A user interacts with a computer with the help of user interfaces. The +user interface can be either GUI or CLI. + +Graphical user interface allows a user to interact with the computer +using graphics such as icons and images. When a user clicks on an icon +to open an application on a computer, he or she is actually using the +GUI. It's easy to perform tasks using GUI. + +Command line interface allows a user to interact with the computer using +commands. A user types the command in a terminal and the system helps in +executing these commands. A new user with experience on GUI may find it +difficult to interact with CLI as he/she needs to be aware of the commands +to perform a particular operation. + +## Shell vs Terminal + +Shell is a program that takes command or a group of commands from the +users and gives them to the operating system for processing. Shell is an +example of command line interface. Bash is one of the most popular shell +programs available on linux servers. Other popular shell programs are +zsh, ksh and tcsh. + +Terminal is a program that opens a window and lets you interact with the +shell. Some popular examples of terminals are gnome-terminal, xterm, +konsole etc. + +Linux users do use the terms shell, terminal, prompt, console etc. +interchangeably. In simple terms, these all refer to a way of taking +commands from the user. diff --git a/courses/linux_basics/linux_server_administration.md b/courses/linux_basics/linux_server_administration.md new file mode 100644 index 0000000..f02e8be --- /dev/null +++ b/courses/linux_basics/linux_server_administration.md @@ -0,0 +1,587 @@ +# Linux Server Administration + +In this course will try to cover some of the common tasks that a linux +server administrator performs. We will first try to understand what a +particular command does and then try to understand the commands using +examples. Do keep in mind that it's very important to practice the linux +commands on your own. + +## Lab Environment Setup + +- Install docker on your system - [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/) + +- We will be running all the commands on Red Hat Enterprise Linux (RHEL) 8 system. + + ![](images/linux/admin/image19.png) + +- We will run most of the commands used in this module in the above docker container. + +## Multi-User Operating Systems + +An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via ssh if the computer is connected to the network. We will cover more about ssh later. + +As a server administrator, we are mostly concerned with the linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like ssh. + +Since linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users + + +## User/Group Management + +- Each user in linux has an associated user ID called UID attached to him + +- Each user also has a home directory and a login shell associated with him/her + +- A group is a collection of one or more users. A group makes it easier to share permissions among a group of users. + +- Each group has a group ID called GID associated with it. + +### id command + +id command can be used to find the uid and gid associated with an user. +It also lists down the groups to which the user belongs to. + +The uid and gid associated with the root user is 0. +![](images/linux/admin/image30.png) + +A good way to find out the current user in linux is to use the whoami +command. + +![](images/linux/admin/image35.png) + +**"root" user or superuser is the most privileged user with** +**unrestricted access to all the resources on the system. It has UID 0** + +### Important files associated with users/groups + +| /etc/passwd | Stores the user name, the uid, the gid, the home directory, the login shell etc | +| -------------| --------------------------------------------------------------------------------- +| /etc/shadow | Stores the password associated with the users | +| /etc/group | Stores information about different groups on the system | + +![](images/linux/admin/image23.png) + +![](images/linux/admin/image21.png) + +![](images/linux/admin/image9.png) + +If you want to understand each filed discussed in the above outputs, you can go +through below links: + +- [https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html](https://tldp.org/LDP/lame/LAME/linux-admin-made-easy/shadow-file-formats.html) + +- [https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html](https://tldp.org/HOWTO/User-Authentication-HOWTO/x71.html) + +## Important commands for managing users + +Some of the commands which are used frequently to manage users/groups +on linux are following: + +- useradd - Creates a new user + +- passwd - Adds or modifies passwords for a user + +- usermod - Modifies attributes of an user + +- userdel - Deletes an user + +### useradd + +The useradd command adds a new user in linux. + +We will create a new user 'shivam'. We will also verify that the user +has been created by tailing the /etc/passwd file. The uid and gid are +1000 for the newly created user. The home directory assigned to the user +is /home/shivam and the login shell assigned is /bin/bash. Do note that +the user home directory and login shell can be modified later on. + +![](images/linux/admin/image41.png) + +If we do not specify any value for attributes like home directory or +login shell, default values will be assigned to the user. We can also +override these default values when creating a new user. + +![](images/linux/admin/image54.png) + +### passwd + +The passwd command is used to create or modify passwords for a user. + +In the above examples, we have not assigned any password for users +'shivam' or 'amit' while creating them. + +\"!!\" in an account entry in shadow means the account of an user has +been created, but not yet given a password. + +![](images/linux/admin/image13.png) + +Let's now try to create a password for user "shivam". + +![](images/linux/admin/image55.png) + +Do remember the password as we will be later using examples +where it will be useful. + +Also, let's change the password for the root user now. When we switch +from a normal user to root user, it will request you for a password. +Also, when you login using root user, the password will be asked. + +![](images/linux/admin/image39.png) + +### usermod + +The usermod command is used to modify the attributes of an user like the +home directory or the shell. + +Let's try to modify the login shell of user "amit" to "/bin/bash". + +![](images/linux/admin/image17.png) + +In a similar way, you can also modify many other attributes for a user. +Try 'usermod -h' for a list of attributes you can modify. + +### userdel + +The userdel command is used to remove a user on linux. Once we remove a +user, all the information related to that user will be removed. + +Let's try to delete the user "amit". After deleting the user, you will +not find the entry for that user in "/etc/passwd" or "/etc/shadow" file. + +![](images/linux/admin/image34.png) + +## Important commands for managing groups + +Commands for managing groups are quite similar to the commands used for managing users. Each command is not explained in detail here as they are quite similar. You can try running these commands on your system. + + +| groupadd \ | Creates a new group | +| ------------------------ | ------------------------------- | +| groupmod \ | Modifies attributes of a group | +| groupdel \ | Deletes a group | +| gpasswd \ | Modifies password for group | + +![](images/linux/admin/image52.png) + +We will now try to add user "shivam" to the group we have created above. + +![](images/linux/admin/image33.png) + +## Becoming a Superuser + +**Before running the below commands, do make sure that you have set up a +password for user "shivam" and user "root" using the passwd command +described in the above section.** + +The su command can be used to switch users in linux. Let's now try to +switch to user "shivam". + +![](images/linux/admin/image37.png) + +Let's now try to open the "/etc/shadow" file. + +![](images/linux/admin/image29.png) + +The operating system didn't allow the user "shivam" to read the content +of the "/etc/shadow" file. This is an important file in linux which +stores the passwords of users. This file can only be accessed by root or +users who have the superuser privileges. + + +**The sudo command allows a** **user to run commands with the security +privileges of the root user.** Do remember that the root user has all +the privileges on a system. We can also use su command to switch to the +root user and open the above file but doing that will require the +password of the root user. An alternative way which is preferred on most +modern operating systems is to use sudo command for becoming a +superuser. Using this way, a user has to enter his/her password and they +need to be a part of the sudo group. + +**How to provide superpriveleges to other users ?** + +Let's first switch to the root user using su command. Do note that using +the below command will need you to enter the password for the root user. + +![](images/linux/admin/image44.png) + +In case, you forgot to set a password for the root user, type "exit" and +you will be back as the root user. Now, set up a password using the +passwd command. + +**The file /etc/sudoers holds the names of users permitted to invoke +sudo**. In redhat operating systems, this file is not present by +default. We will need to install sudo. + +![](images/linux/admin/image3.png) + +We will discuss the yum command in detail in later sections. + +Try to open the "/etc/sudoers" file on the system. The file has a lot of +information. This file stores the rules that users must follow when +running the sudo command. For example, root is allowed to run any +commands from anywhere. + +![](images/linux/admin/image8.png) + +One easy way of providing root access to users is to add them to a group +which has permissions to run all the commands. "wheel" is a group in +redhat linux with such privileges. + +![](images/linux/admin/image25.png) + +Let's add the user "shivam" to this group so that it also has sudo +privileges. + +![](images/linux/admin/image48.png) + +Let's now switch back to user "shivam" and try to access the +"/etc/shadow" file. + +![](images/linux/admin/image56.png) + +We need to use sudo before running the command since it can only be +accessed with the sudo privileges. We have already given sudo privileges +to user “shivam” by adding him to the group “wheel”. + + +## File Permissions + +On a linux operating system, each file and directory is assigned access +permissions for the owner of the file, the members of a group of related +users and everybody else. This is to make sure that one user is not +allowed to access the files and resources of another user. + +To see the permissions of a file, we can use the ls command. Let's look +at the permissions of /etc/passwd file. + +![](images/linux/admin/image40.png) + +Let's go over some of the important fields in the output that are +related to file permissions. + +![](images/linux/admin/image31.jpg) + + +![](images/linux/admin/image57.png) + +### Chmod command + +The chmod command is used to modify files and directories permissions in +linux. + +The chmod command accepts permissions in as a numerical argument. We can +think of permission as a series of bits with 1 representing True or +allowed and 0 representing False or not allowed. + +| Permission | rwx | Binary | Decimal | +| -------------------------| ------- | ------- | --------- | +| Read, write and execute | rwx | 111 | 7 | +| Read and write | rw- | 110 | 6 | +| Read and execute | r-x | 101 | 5 | +| Read only | r-- | 100 | 4 | +| Write and execute | -wx | 011 | 3 | +| Write only | -w- | 010 | 2 | +| Execute only | --x | 001 | 1 | +| None | --- | 000 | 0 | + +We will now create a new file and check the permission of the file. + +![](images/linux/admin/image15.png) + +The group owner doesn't have the permission to write to this file. Let's +give the group owner or root the permission to write to it using chmod +command. + +![](images/linux/admin/image26.png) + +Chmod command can be also used to change the permissions of a directory +in the similar way. + +### Chown command + +The chown command is used to change the owner of files or +directories in linux. + +Command syntax: chown \ \ + +![](images/linux/admin/image6.png) + +**In case, we do not have sudo privileges, we need to use sudo +command**. Let's switch to user 'shivam' and try changing the owner. We +have also changed the owner of the file to root before running the below +command. + +![](images/linux/admin/image12.png) + +Chown command can also be used to change the owner of a directory in the +similar way. + +### Chgrp command + +The chgrp command can be used to change the group ownership of files or +directories in linux. The syntax is very similar to that of chown +command. + +![](images/linux/admin/image27.png) + +Chgrp command can also be used to change the owner of a directory in the +similar way. + +## SSH Command + +The ssh command is used for logging into the remote systems, transfer files between systems and for executing commands on a remote machine. SSH stands for secure shell and is used to provide an encrypted secured connection between two hosts over an insecure network like the internet. + +Reference: +[https://www.ssh.com/ssh/command/](https://www.ssh.com/ssh/command/) + +We will now discuss passwordless authentication which is secure and most +commonly used for ssh authentication. + +### Passwordless Authentication Using SSH + +Using this method, we can ssh into hosts without entering the password. +This method is also useful when we want some scripts to perform +ssh-related tasks. + +Passwordless authentication requires the use of a public and private key pair. As the name implies, the public key can be shared with anyone but the private key should be kept private. +Lets not get into the details of how this authentication works. You can read more about it +[here](https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process) + +Steps for setting up a passwordless authentication with a remote host: + +1. Generating public-private key pair + + **If we already have a key pair stored in \~/.ssh directory, we will not need to generate keys again.** + + Install openssh package which contains all the commands related to ssh. + + ![](images/linux/admin/image49.png) + + Generate a key pair using the ssh-keygen command. One can choose the + default values for all prompts. + + ![](images/linux/admin/image47.png) + + After running the ssh-keygen command successfully, we should see two + keys present in the \~/.ssh directory. Id_rsa is the private key and + id_rsa.pub is the public key. Do note that the private key can only be + read and modified by you. + + ![](images/linux/admin/image7.png) + +2. Transferring the public key to the remote host + + There are multiple ways to transfer the public key to the remote server. + We will look at one of the most common ways of doing it using the + ssh-id-copy command. + + ![](images/linux/admin/image11.png) + + Install the openssh-clients package to use ssh-id-copy command. + + ![](images/linux/admin/image46.png) + + Use the ssh-id-copy command to copy your public key to the remote host. + + ![](images/linux/admin/image50.png) + + Now, ssh into the remote host using the password authentication. + + ![](images/linux/admin/image51.png) + + Our public key should be there in \~/.ssh/authorized_keys now. + + ![](images/linux/admin/image4.png) + + \~/.ssh/authorized_key contains a list of public keys. The users + associated with these public keys have the ssh access into the remote + host. + + +### How to run commands on a remote host ? + +General syntax: ssh \@\ \ + +![](images/linux/admin/image14.png) + +### How to transfer files from one host to another host ? + +General syntax: scp \ \ + +![](images/linux/admin/image32.png) + +## Package Management + +Package management is the process of installing and managing software on +the system. We can install the packages which we require from the linux +package distributor. Different distributors use different packaging +systems. + +| Packaging systems | Distributions | +| ---------------------- | ------------------------------------------ | +| Debian style (.deb) | Debian, Ubuntu | +| Red Hat style (.rpm) | Fedora, CentOS, Red Hat Enterprise Linux | + +**Popular Packaging Systems in Linux** + +|Command | Description | +| ----------------------------- | --------------------------------------------------- | +| yum install \ | Installs a package on your system | +| yum update \ | Updates a package to it's latest available version | +| yum remove \ | Removes a package from your system | +| yum search \ | Searches for a particular keyword | + +[DNF](https://docs.fedoraproject.org/en-US/quick-docs/dnf/) is +the successor to YUM which is now used in Fedora for installing and +managing packages. DNF may replace YUM in the future on all RPM based +linux distributions. + +![](images/linux/admin/image20.png) + +We did find an exact match for the keyword httpd when we searched using +yum search command. Let's now install the httpd package. + +![](images/linux/admin/image28.png) + +After httpd is installed, we will use the yum remove command to remove +httpd package. + +![](images/linux/admin/image43.png) + +## Process Management + +In this section, we will study about some useful commands that can be +used to monitor the processes on linux systems. + +### ps (process status) + +The ps command is used to know the information of a process or list of +processes. + +![](images/linux/admin/image24.png) + +If you get an error "ps command not found" while running ps command, do +install **procps** package. + +ps without any arguments is not very useful. Let's try to list all the +processes on the system by using the below command. + +Reference: +[https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux](https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux) + +![](images/linux/admin/image42.png) + +We can use an additional argument with ps command to list the +information about the process with a specific process ID. + +![](images/linux/admin/image2.png) + +We can use grep in combination with ps command to list only specific +processes. + +![](images/linux/admin/image1.png) + +### top + +The top command is used to show information about linux processes +running on the system in real time. It also shows a summary of the +system information. + +![](images/linux/admin/image53.png) + +For each process, top lists down the process ID, owner, priority, state, +cpu utilization, memory utilization and much more information. It also +lists down the memory utilization and cpu utilization of the system as a +whole along with system uptime and cpu load average. + +## Memory Management + +In this section, we will study about some useful commands that can be +used to view information about the system memory. + +### free + +The free command is used to display the memory usage of the system. The +command displays the total free and used space available in the RAM +along with space occupied by the caches/buffers. + +![](images/linux/admin/image22.png) + +free command by default shows the memory usage in kilobytes. We can use +an additional argument to get the data in human-readable format. + +![](images/linux/admin/image5.png) + +### vmstat + +The vmstat command can be used to display the memory usage along with +additional information about io and cpu usage. + +![](images/linux/admin/image38.png) + +## Checking Disk Space + +In this section, we will study about some useful commands that can be +used to view disk space on linux. + +### df (disk free) + +The df command is used to display the free and available space for each +mounted file system. + +![](images/linux/admin/image36.png) + +### du (disk usage) + +The du command is used to display disk usage of files and directories on +the system. + +![](images/linux/admin/image10.png) + +The below command can be used to display the top 5 largest directories +in the root directory. + +![](images/linux/admin/image18.png) + +## Daemons + +A computer program that runs as a background process is called a daemon. +Traditionally, the name of daemon processes ended with d - sshd, httpd +etc. We cannot interact with a daemon process as they run in the +background. + +Services and daemons are used interchangeably most of the time. + +## Systemd + +Systemd is a system and service manager for Linux operating systems. +Systemd units are the building blocks of systemd. These units are +represented by unit configuration files. + +The below examples shows the unit configuration files available at +/usr/lib/systemd/system which are distributed by installed RPM packages. +We are more interested in the configuration file that ends with service +as these are service units. + +![](images/linux/admin/image16.png) + +### Managing System Services + +Service units end with .service file extension. Systemctl command can be +used to start/stop/restart the services managed by systemd. + +| Command | Description | +| ------------------------------- | -------------------------------------- | +| systemctl start name.service | Starts a service | +| systemctl stop name.service | Stops a service | +| systemctl restart name.service | Restarts a service | +| systemctl status name.service | Check the status of a service | +| systemctl reload name.service | Reload the configuration of a service | + +## Logs + +In this section, we will talk about some important files and directories +which can be very useful for viewing system logs and applications logs +in linux. These logs can be very useful when you are troubleshooting on +the system. + +![](images/linux/admin/image58.png) diff --git a/courses/linux_networking/conclusion.md b/courses/linux_networking/conclusion.md new file mode 100644 index 0000000..0b15e6d --- /dev/null +++ b/courses/linux_networking/conclusion.md @@ -0,0 +1,11 @@ +# Conclusion + +With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. + +During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE. + +# Post Training Exercises +1. Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost +2. Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CA’s public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning +3. Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect) + diff --git a/courses/linux_networking/dns.md b/courses/linux_networking/dns.md new file mode 100644 index 0000000..2ff14a8 --- /dev/null +++ b/courses/linux_networking/dns.md @@ -0,0 +1,142 @@ +# DNS +Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this + +```python +ip, err = getIPAddress(domainName) +if err: + print(“unknown Host Exception while trying to resolve:%s”.format(domainName)) +``` + +Now let’s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName + +```python +def getIPAddress(domainName): + resp, fail = lookupCache(domainName) + If not fail: + return resp + else: + resp, err = gethostbyname(domainName) + if err: + return null, err + else: + return resp +``` + +Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file [/etc/nsswitch.conf](https://man7.org/linux/man-pages/man5/nsswitch.conf.5.html) file which usually has a line + +```bash +hosts: files dns +``` + +This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts. + +The file /etc/hosts is of format + + +IPAddress FQDN [FQDN].* + +```bash +127.0.0.1 localhost.localdomain localhost +::1 localhost.localdomain localhost +``` + +If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file + +```bash +127.0.0.1 test.linkedin.com +``` + +And then do ping test.linkedin.com + +```bash +ping test.linkedin.com -n +``` + +```bash +PING test.linkedin.com (127.0.0.1) 56(84) bytes of data. +64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms +64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms +64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms + +``` + +As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) or statically configured by an administrator. +[Dig](https://linux.die.net/man/1/dig) is a userspace DNS system which creates and sends requests to DNS resolvers and prints the response it receives to the console. + +```bash +#run this command in one shell to capture all DNS requests +sudo tcpdump -s 0 -A -i any port 53 +#make a dig request from another shell +dig linkedin.com +``` + +```bash +13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41) +....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........ +13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57) +....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l. + +..)........ +``` + + +The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10 + +Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks “linkedin.com” to “.”, “com.” and “linkedin.com.” and starts DNS resolution from “.”. The “.” is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain Nameservers to find the right nameservers which could respond regarding details for "com.". The address of the authoritative nameserver of “com.” is returned. Now the DNS resolution service contacts the authoritative nameserver for “com.” to fetch the authoritative nameserver for “linkedin.com”. Once an authoritative nameserver of “linkedin.com” is known, the resolver contacts Linkedin’s nameserver to provide the IP address of “linkedin.com”. This whole process can be visualized by running + +```bash +dig +trace linkedin.com +``` + + +```bash +linkedin.com. 3600 IN A 108.174.10.10 +``` +This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. +The 4th field says the type of DNS response/request. Some of the various DNS query types are +A, AAAA, NS, TXT, PTR, MX and CNAME. +- A record returns IPV4 address of the domain name +- AAAA record returns the IPV6 address of the domain Name +- NS record returns the authoritative nameserver for the domain name +- CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.com’s IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net. +- For the brevity we are not discussing other DNS record types, the RFC of each of these records are available [here](https://en.wikipedia.org/wiki/List_of_DNS_record_types). + +```bash +dig A linkedin.com +short +108.174.10.10 + + +dig AAAA linkedin.com +short +2620:109:c002::6cae:a0a + + +dig NS linkedin.com +short +dns3.p09.nsone.net. +dns4.p09.nsone.net. +dns2.p09.nsone.net. +ns4.p43.dynect.net. +ns1.p43.dynect.net. +ns2.p43.dynect.net. +ns3.p43.dynect.net. +dns1.p09.nsone.net. + +dig www.linkedin.com CNAME +short +2-01-2c3e-005a.cdx.cedexis.net. +``` +Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs. + +## Applications in SRE role + +This section covers some of the common solutions SRE can derive from DNS +1. Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn’t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects. +2. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery([example](https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/service-discovery.html#dns-based-service-discovery)) +3. DNS is used by cloud provides and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute. +4. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed. +5. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses. +6. Stale DNS cache can be a problem. Some [apps](https://stackoverflow.com/questions/1256556/how-to-make-java-honor-the-dns-caching-timeout) might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance. +7. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL. + + + + + diff --git a/courses/linux_networking/http.md b/courses/linux_networking/http.md new file mode 100644 index 0000000..a02adff --- /dev/null +++ b/courses/linux_networking/http.md @@ -0,0 +1,129 @@ +# HTTP + +Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above. +Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT) + +```bash +# Eg run the following in your container and have a look at the headers +curl linkedin.com -v +``` +```bash +* Connected to linkedin.com (108.174.10.10) port 80 (#0) +> GET / HTTP/1.1 +> Host: linkedin.com +> User-Agent: curl/7.64.1 +> Accept: */* +> +< HTTP/1.1 301 Moved Permanently +< Date: Mon, 09 Nov 2020 10:39:43 GMT +< X-Li-Pop: prod-esv5 +< X-LI-Proto: http/1.1 +< Location: https://www.linkedin.com/ +< Content-Length: 0 +< +* Connection #0 to host linkedin.com left intact +* Closing connection 0 +``` + +Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message. Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors. + +We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1. + +```bash +#On the terminal type +telnet www.linkedin.com 80 +#Copy and paste the following with an empty new line at last in the telnet STDIN +GET / HTTP/1.1 +HOST:linkedin.com +USER-AGENT: curl + +``` + + +This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course. + +HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedin’s password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated. + +HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric. + +```bash +#Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date +curl https://www.linkedin.com -v +``` +```bash +* Connected to www.linkedin.com (13.107.42.14) port 443 (#0) +* ALPN, offering h2 +* ALPN, offering http/1.1 +* successfully set certificate verify locations: +* CAfile: /etc/ssl/cert.pem + CApath: none +* TLSv1.2 (OUT), TLS handshake, Client hello (1): +} [230 bytes data] +* TLSv1.2 (IN), TLS handshake, Server hello (2): +{ [90 bytes data] +* TLSv1.2 (IN), TLS handshake, Certificate (11): +{ [3171 bytes data] +* TLSv1.2 (IN), TLS handshake, Server key exchange (12): +{ [365 bytes data] +* TLSv1.2 (IN), TLS handshake, Server finished (14): +{ [4 bytes data] +* TLSv1.2 (OUT), TLS handshake, Client key exchange (16): +} [102 bytes data] +* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): +} [1 bytes data] +* TLSv1.2 (OUT), TLS handshake, Finished (20): +} [16 bytes data] +* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): +{ [1 bytes data] +* TLSv1.2 (IN), TLS handshake, Finished (20): +{ [16 bytes data] +* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 +* ALPN, server accepted to use h2 +* Server certificate: +* subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com +* start date: Oct 2 00:00:00 2020 GMT +* expire date: Apr 2 12:00:00 2021 GMT +* subjectAltName: host "www.linkedin.com" matched cert's "www.linkedin.com" +* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA +* SSL certificate verify ok. +* Using HTTP2, server supports multi-use +* Connection state changed (HTTP/2 confirmed) +* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 +* Using Stream ID: 1 (easy handle 0x7fb055808200) +* Connection state changed (MAX_CONCURRENT_STREAMS == 100)! + 0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 +* Connection #0 to host www.linkedin.com left intact +HTTP/2 200 +cache-control: no-cache, no-store +pragma: no-cache +content-length: 82117 +content-type: text/html; charset=utf-8 +expires: Thu, 01 Jan 1970 00:00:00 GMT +set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure +set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure +set-cookie: bcookie="v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None +set-cookie: bscookie="v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None +set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None +set-cookie: lidc="b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure +x-fs-txn-id: 2b8d5409ba70 +x-fs-uuid: 61bbf94956d14516302567fc882b0000 +expect-ct: max-age=86400, report-uri="https://www.linkedin.com/platform-telemetry/ct" +x-xss-protection: 1; mode=block +content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g +content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l +x-frame-options: sameorigin +x-content-type-options: nosniff +strict-transport-security: max-age=2592000 +x-li-fabric: prod-lva1 +x-li-pop: afd-prod-lva1 +x-li-proto: http/2 +x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA== +x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z +date: Mon, 09 Nov 2020 10:50:10 GMT + +* Closing connection 0 +``` + +Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key. + + diff --git a/courses/linux_networking/images/arp.gif b/courses/linux_networking/images/arp.gif new file mode 100644 index 0000000..3395030 Binary files /dev/null and b/courses/linux_networking/images/arp.gif differ diff --git a/courses/linux_networking/images/closed.png b/courses/linux_networking/images/closed.png new file mode 100644 index 0000000..f1d98ed Binary files /dev/null and b/courses/linux_networking/images/closed.png differ diff --git a/courses/linux_networking/images/established.png b/courses/linux_networking/images/established.png new file mode 100644 index 0000000..5879e9e Binary files /dev/null and b/courses/linux_networking/images/established.png differ diff --git a/courses/linux_networking/images/pcap.png b/courses/linux_networking/images/pcap.png new file mode 100644 index 0000000..2209864 Binary files /dev/null and b/courses/linux_networking/images/pcap.png differ diff --git a/courses/linux_networking/intro.md b/courses/linux_networking/intro.md new file mode 100644 index 0000000..34f7484 --- /dev/null +++ b/courses/linux_networking/intro.md @@ -0,0 +1,26 @@ +# Linux Networking Fundamentals + +## Prerequisites + +This course requires high-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP. Basic familiarity with Linux jargon is sufficient to start this course. This course also expects basic exposure to Linux command-line tools. The course will require you to install certain utilities and run them as a part of the course exercises. + +## What to expect from this course + +Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird’s eye view of the functioning of the Internet. + +## What is not covered under this course + +This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0, QUIC, TCP congestion control protocols, Anycast, BGP, CDN, Tunnels and Multicast. We expect that this course will provide the relevant basics to understand such concepts + +## Course Content + +### Birds eye view of the course + +The course covers the question “What happens when you open linkedin.com in your browser?” The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol + +## Table of Contents +1. [DNS](https://linkedin.github.io/school-of-sre/linux_networking/dns/) +2. [UDP](https://linkedin.github.io/school-of-sre/linux_networking/udp/) +3. [HTTP](https://linkedin.github.io/school-of-sre/linux_networking/http/) +4. [TCP](https://linkedin.github.io/school-of-sre/linux_networking/tcp/) +5. [IP Routing](https://linkedin.github.io/school-of-sre/linux_networking/ipr/) diff --git a/courses/linux_networking/ipr.md b/courses/linux_networking/ipr.md new file mode 100644 index 0000000..2e404fc --- /dev/null +++ b/courses/linux_networking/ipr.md @@ -0,0 +1,32 @@ +# IP Routing and Data Link Layer +We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table. + +```bash +#Linux route -n command gives the default routing table +route -n +``` + +```bash +Kernel IP routing table +Destination Gateway Genmask Flags Metric Ref Use Iface +0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 +172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 +``` + +Here the destination IP is bitwise AND’d with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.com’s IP 108.174.10.10 is AND’d with 255.255.255.0 and the answer we get is 108.174.10.0 which doesn’t match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row + +Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches. +At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0. +Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved. + +![Screengrab for above explanation](images/arp.gif) + +One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle + +As we followed in other modules, lets complete this session with SRE usecases + +## Applications in SRE role +1. Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary +2. Understanding error messages better like, “No route to host” error can mean mac address of the destination host is not found and it can mean the destination host is down +3. On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior + diff --git a/courses/linux_networking/tcp.md b/courses/linux_networking/tcp.md new file mode 100644 index 0000000..4a194eb --- /dev/null +++ b/courses/linux_networking/tcp.md @@ -0,0 +1,35 @@ +# TCP + +TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. +TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party + +![3-way handshake](images/established.png) + +```bash +#To understand handshake run packet capture on one bash session +tcpdump -S -i any port 80 +#Run curl on one bash session +curl www.linkedin.com +``` + +![tcpdump-3way](images/pcap.png) + + +Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client it’s expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server won’t send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability. +The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem + +TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. + +While closing a connection, client/server calls a close syscall. Let's assume client do that. Client’s kernel will send a FIN packet to the server. Server’s kernel can’t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket can’t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. + +![Connection tearing](images/closed.png) + +Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role + +## Applications in SRE role +1. Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are [different kinds of load balancing](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236?gi=428394dbdcc3) like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. +2. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver. +3. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall +4. Apps can run out of file descriptors if there are too many short lived connections. Digging through [tcp_reuse and tcp_recycle](http://lxr.linux.no/linux+v3.2.8/Documentation/networking/ip-sysctl.txt#L464) can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help +5. Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is + diff --git a/courses/linux_networking/udp.md b/courses/linux_networking/udp.md new file mode 100644 index 0000000..351e59f --- /dev/null +++ b/courses/linux_networking/udp.md @@ -0,0 +1,15 @@ +# UDP + + +UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any [ports](https://en.wikipedia.org/wiki/Port_(computer_networking)). DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via **sendto** system call. The kernel picks a random port number([>1024](https://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html)) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a **recvfrom** system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. + +UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesn’t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application + +This [example from python wiki](https://wiki.python.org/moin/UdpCommunication) covers a sample UDP client and server where “Hello World” is an application payload sent to server listening on port number 5005. The server receives the packet and prints the “Hello World” string from the client + +## Applications in SRE role + + +1. If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using [sysctl variables](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-adjusting_network_settings-changing_network_kernel_settings) *net.core.wmem_max* and *net.core.wmem_default* provides some cushion to the application from the slow network +2. Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can’t queue due to the buffer being full. Since UDP doesn’t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables *rmem_default* and *rmem_max* can provide some cushion to slow applications from fast senders. + diff --git a/courses/python_web/intro.md b/courses/python_web/intro.md index 807f0b7..52df081 100644 --- a/courses/python_web/intro.md +++ b/courses/python_web/intro.md @@ -1,11 +1,11 @@ -# School of SRE: Python and The Web +# Python and The Web -## Pre - Reads +## Prerequisites - Basic understanding of python language. - Basic familiarity with flask framework. -## What to expect from this training +## What to expect from this course This course is divided into two high level parts. In the first part, assuming familiarity with python language’s basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators. @@ -13,25 +13,25 @@ In the second part which will revolve around the web, and also assume familiarit And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service. -## What is not covered under this training +## What is not covered under this course Extensive knowledge of python internals and advanced python. -## Training Content +## Course Content ### Lab Environment Setup Have latest version of python installed -### TOC +### Table of Contents -1. The Python Language +1. [The Python Language](https://linkedin.github.io/school-of-sre/python_web/intro/#the-python-language) 1. Some Python Concepts 2. Python Gotchas -2. Python and Web +2. [Python and Web](https://linkedin.github.io/school-of-sre/python_web/python-web-flask/) 1. Sockets 2. Flask -3. The URL Shortening App +3. [The URL Shortening App](https://linkedin.github.io/school-of-sre/python_web/url-shorten-app/) 1. Design 2. Scaling The App 3. Monitoring The App diff --git a/courses/python_web/python-web-flask.md b/courses/python_web/python-web-flask.md index df09e35..c797830 100644 --- a/courses/python_web/python-web-flask.md +++ b/courses/python_web/python-web-flask.md @@ -1,4 +1,4 @@ -# Python, Web amd Flask +# Python, Web and Flask Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.) diff --git a/courses/python_web/sre-conclusion.md b/courses/python_web/sre-conclusion.md index 8ced4a1..bd76f27 100644 --- a/courses/python_web/sre-conclusion.md +++ b/courses/python_web/sre-conclusion.md @@ -1,4 +1,4 @@ -# SRE Parts of The App and Conclusion +# Conclusion ## Scaling The App diff --git a/courses/security/fundamentals.md b/courses/security/fundamentals.md new file mode 100644 index 0000000..06b1a89 --- /dev/null +++ b/courses/security/fundamentals.md @@ -0,0 +1,334 @@ +# Part I: Fundamentals + +## Introduction to Security Overview for SRE + +- If you look closely, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable. + - Issues like broken releases, capacity shortages, and misconfigurations can make a system unusable (at least temporarily). + - Security or privacy incidents that break the trust of users also undermine the usefulness of a system. + - Consequently, system security should be top of mind for SREs. + +![The Wide Area of security](images/image1.png) + +- SREs should be involved in both significant design discussions and actual system changes. + - They have quite a big role in System design & hence are quite sometimes the first line of defense. + - SRE’s help in preventing bad design & implementations which can affect the overall security of the infrastructure. +- Successfully designing, implementing, and maintaining systems requires a commitment to **the full system lifecycle**. This commitment is possible only when security and reliability are central elements in the architecture of systems. +- Core Pillars of Information Security : + - **Confidentiality** – only allow access to data for which the user is permitted + - **Integrity** – ensure data is not tampered or altered by unauthorized users + - **Availability** – ensure systems and data are available to authorized users when they need it + +- Thinking like a Security Engineer + - When starting a new application or re-factoring an existing application, you should consider each functional feature, and consider: + - Is the process surrounding this feature as safe as possible? In other words, is this a flawed process? + - If I were evil, how would I abuse this feature? Or more specifically failing to address how a feature can be abused can cause design flaws. + - Is the feature required to be on by default? If so, are there limits or options that could help reduce the risk from this feature? + +- Security Principles By OWASP (Open Web Application Security Project) + - Minimize attack surface area : + - Every feature that is added to an application adds a certain amount of risk to the overall application. The aim for secure development is to reduce the overall risk by reducing the attack surface area. + - For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help feature’s search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large. + - Establish secure defaults: + - There are many ways to deliver an “out of the box” experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security – if they are allowed. + - For example, by default, password aging and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk. + - Default Passwords of routers, IOT devices should be changed + - Principle of Least privilege + - The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions. + - For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges. + - Principle of Defense in depth + - The principle of defense in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in-depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur. + - With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages. + - For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access. + - Fail securely + - Applications regularly fail to process transactions for many reasons. How they fail can determine if an application is secure or not. + + ``` + + is_admin = true; + try { + code_which_may_faile(); + is_admin = is_user_assigned_role("Adminstrator"); + } + catch (Exception err) { + log.error(err.toString()); + } + + ``` + - If either codeWhichMayFail() or isUserInRole fails or throws an exception, the user is an admin by default. This is obviously a security risk. + + - Don’t trust services + - Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners. + - Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated in a similar fashion. + - For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users, and that the reward points are a positive number, and not improbably large. + - Separation of duties + - The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived. + - Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application. + - For example, an administrator should be able to turn the system on or off, set password policy but shouldn’t be able to log on to the storefront as a super privileged user, such as being able to “buy” goods on behalf of other users. + - Avoid security by obscurity + - Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden. + - For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defense in depth, business transaction limits, solid network architecture, and fraud, and audit controls. + - A practical example is Linux. Linux’s source code is widely available, and yet when properly secured, Linux is a secure and robust operating system. + - Keep security simple + - Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design. + - Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler. + - For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions. + - Fix security issues correctly + - Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, it is likely that the security issue is widespread amongst all codebases, so developing the right fix without introducing regressions is essential. + - For example, a user has found that they can see another user’s balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications. + - Reliability & Security + - Reliability and security are both crucial components of a truly trustworthy system,but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes + - Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery was later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks , which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system. + +--- + +## Authentication vs Authorization + +- **Authentication** is the act of validating that users are who they claim to be. Passwords are the most common authentication factor—if a user enters the correct password, the system assumes the identity is valid and grants access. + - Other technologies such as One-Time Pins, authentication apps, and even biometrics can also be used to authenticate identity. In some instances, systems require the successful verification of more than one factor before granting access. This multi-factor authentication (MFA) requirement is often deployed to increase security beyond what passwords alone can provide. +- **Authorization** in system security is the process of giving the user permission to access a specific resource or function. This term is often used interchangeably with access control or client privilege. Giving someone permission to download a particular file on a server or providing individual users with administrative access to an application are good examples. In secure environments, authorization must always follow authentication, users should first prove that their identities are genuine before an organization’s administrators grant them access to the requested resources. + +### Common authentication flow (local authentication) + +- The user registers using an identifier like username/email/mobile +- The application stores user credentials in the database +- The application sends a verification email/message to validate the registration +- Post successful registration, the user enters credentials for logging in +- On successful authentication, the user is allowed access to specific resources + +### OpenID/OAuth + +***OpenID*** is an authentication protocol that allows us to authenticate users without using a local auth system. In such a scenario, a user has to be registered with an OpenID Provider and the same provider should be integrated with the authentication flow of your application. To verify the details, we have to forward the authentication requests to the provider. On successful authentication, we receive a success message and/or profile details with which we can execute the necessary flow. + +***OAuth*** is an authorization mechanism that allows your application user access to a provider(Gmail/Facebook/Instagram/etc). On successful response, we (your application) receive a token with which the application can access certain APIs on behalf of a user. OAuth is convenient in case your business use case requires some certain user-facing APIs like access to Google Drive or sending tweets on your behalf. Most OAuth 2.0 providers can be used for pseudo authentication. Having said that, it can get pretty complicated if you are using multiple OAuth providers to authenticate users on top of the local authentication system. + +--- + +## Cryptography + +- It is the science and study of hiding any text in such a way that only the intended recipients or authorized persons can read it and that any text can even use things such as invisible ink or the mechanical cryptography machines of the past. + +- Cryptography is necessary for securing critical or proprietary information and is used to encode private data messages by converting some plain text into ciphertext. At its core, there are two ways of doing this, more advanced methods are all built upon. + +### Ciphers + +- Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m), and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows: + +``` + +E(k,m) = c +D(k,c) = m + +``` + +- This also means that in order for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt. + +``` + +D(k,E(k,m)) = m +``` + +Stream Ciphers: + +- The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream. +- sIf the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release. + +Block Ciphers: + +- Block ciphers — process messages in blocks, each of which is then encrypted or decrypted. +- A block cipher is a symmetric cipher in which blocks of plaintext are treated as a whole and used to produce ciphertext blocks. The block cipher takes blocks that are b bits long and encrypts them to blocks that are also b bits long. Block sizes are typically 64 or 128 bits long. + + ![image5](images/image5.png) + ![image6](images/image6.png) + +Encryption + +- **Secret Key (Symmetric Key)**: the same key is used for encryption and decryption +- **Public Key (Asymmetric Key)** in an asymmetric, the encryption and decryption keys are different but related. The encryption key is known as the public key and the decryption key is known as the private key. The public and private keys are known as a key pair. + +Symmetric Key Encryption + +DES + +- The Data Encryption Standard (DES) has been the worldwide encryption standard for a long time. IBM developed DES in 1975, and it has held up remarkably well against years of cryptanalysis. DES is a symmetric encryption algorithm with a fixed key length of 56 bits. The algorithm is still good, but because of the short key length, it is susceptible to brute-force attacks that have sufficient resources. + +- DES usually operates in block mode, whereby it encrypts data in 64-bit blocks. The same algorithm and key are used for both encryption and decryption. + +- Because DES is based on simple mathematical functions, it can be easily implemented and accelerated in hardware. + +Triple DES + +- With advances in computer processing power, the original 56-bit DES key became too short to withstand an attacker with even a limited budget. One way of increasing the effective key length of DES without changing the well-analyzed algorithm itself is to use the same algorithm with different keys several times in a row. + +- The technique of applying DES three times in a row to a plain text block is called Triple DES (3DES). The 3DES technique is shown in Figure. Brute-force attacks on 3DES are considered unfeasible today. Because the basic algorithm has been tested in the field for more than 25 years, it is considered to be more trustworthy than its predecessor. +![image7](images/image7.png) + +AES + +- On October 2, 2000, The U.S. National Institute of Standards and Technology (NIST) announced the selection of the Rijndael cipher as the AES algorithm. This cipher, developed by Joan Daemen and Vincent Rijmen, has a variable block length and key length. The algorithm currently specifies how to use keys with a length of 128, 192, or 256 bits to encrypt blocks with a length of 128, 192, or 256 bits (all nine combinations of key length and block length are possible). Both block and key lengths can be extended easily to multiples of 32 bits. + +- AES was chosen to replace DES and 3DES because they are either too weak (DES, in terms of key length) or too slow (3DES) to run on modern, efficient hardware. AES is more efficient and much faster, usually by a factor of 5 compared to DES on the same hardware. AES is also more suitable for high throughput, especially if pure software encryption is used. However, AES is a relatively young algorithm, and as the golden rule of cryptography states, “A more mature algorithm is always more trusted.” + +Asymmetric Key Algorithm + +![image8](images/image8.png) + +- In a symmetric key system, Alice first puts the secret message in a box and then padlocks the box using a lock to which she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical copy of Alice's key (which he has obtained previously) to open the box and read the message. + +- In an asymmetric key system, instead of opening the box when he receives it, Bob simply adds his own personal lock to the box and returns the box through public mail to Alice. Alice uses her key to remove her lock and returns the box to Bob, with Bob's lock still in place. Finally, Bob uses his key to remove his lock and reads the message from Alice. +- The critical advantage in an asymmetric system is that Alice never needs to send a copy of her key to Bob. This reduces the possibility that a third party (for example, an unscrupulous postmaster) can copy the key while it is in transit to Bob, allowing that third party to spy on all future messages sent by Alice. In addition, if Bob is careless and allows someone else to copy his key, Alice's messages to Bob are compromised, but Alice's messages to other people remain secret + +**NOTE**: In terms of TLS key exchange, this is the common approach. + +Diffie-Hellman + +- The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p – 1 inclusive, there is a power k of g such that n = gk mod p. +- Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication. +- Refer: + +RSA + +- The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed. +- Refer : + + **NOTE** : RSA Keys can be used for key exchange just like Deffie Hellman + +Hashing Algorithms + +- Hashing is one of the mechanisms used for data integrity assurance. Hashing is based on a one-way mathematical function, which is relatively easy to compute but significantly harder to reverse. + +- A hash function, which is a one-way function to input data to produce a fixed-length digest (fingerprint) of output data. The digest is cryptographically strong; that is, it is impossible to recover input data from its digest. If the input data changes just a little, the digest (fingerprint) changes substantially in what is called an avalanche effect. + +- More: + - + - + +MD5 + +- MD5 is a one-way function with which it is easy to compute the hash from the given input data, but it is unfeasible to compute input data given only a hash. + +SHA-1 + +- MD5 is considered less secure than SHA-1 because MD5 has some weaknesses. +- HA-1 also uses a stronger, 160-bit digest, which makes MD5 the second choice as hash methods are concerned. +- The algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest. This algorithm is slightly slower than MD5. + + **NOTE**: SHA-1 is also recently demonstrated to be broken, Minimum current recommendation is SHA-256 + +Digital Certificates + +- Digital signatures, provide a means to digitally authenticate devices and individual users. In public-key cryptography, such as the RSA encryption system, each user has a key-pair containing both a public key and a private key. The keys act as complements, and anything encrypted with one of the keys can be decrypted with the other. In simple terms, a signature is formed when data is encrypted with a user's private key. The receiver verifies the signature by decrypting the message with the sender's public key. + +- Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet. + +- PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains a number of attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI. +- A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization. +- The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender. +- To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default. + +CA Enrollment process + +1. The end host generates a private-public key pair. +2. The end host generates a certificate request, which it forwards to the CA. +3. Manual human intervention is required to approve the enrollment request, which is received by the CA. +4. After the CA operator approves the request, the CA signs the certificate request with its private key and returns the completed certificate to the end host. +5. The end host writes the certificate into a nonvolatile storage area (PC hard disk or NVRAM on Cisco routers). + +**Refer**: + +## Login Security + +### SSH + +- SSH, the Secure Shell, is a popular, powerful, software-based approach to network security. +- Whenever data is sent by a computer to the network, SSH automatically encrypts (scrambles) it. Then, when the data reaches its intended recipient, SSH automatically decrypts (unscrambles) it. +- The result is transparent encryption: users can work normally, unaware that their communications are safely encrypted on the network. In addition, SSH can use modern, secure encryption algorithms based on how it's being configured and is effective enough to be found within mission-critical applications at major corporations. +- SSH has a client/server architecture +- An SSH server program, typically installed and run by a system administrator, accepts or rejects incoming connections to its host computer. Users then run SSH client programs, typically on other computers, to make requests of the SSH server, such as “Please log me in,” “Please send me a file,” or “Please execute this command.” All communications between clients and servers are securely encrypted and protected from modification. + + ![image9](images/image9.png) + +What SSH is not: + +- Although SSH stands for Secure Shell, it is not a true shell in the sense of the Unix Bourne shell and C shell. It is not a command interpreter, nor does it provide wildcard expansion, command history, and so forth. Rather, SSH creates a channel for running a shell on a remote computer, with end-to-end encryption between the two systems. + +The major features and guarantees of the SSH protocol are: + +- Privacy of your data, via strong encryption +- Integrity of communications, guaranteeing they haven’t been altered +- Authentication, i.e., proof of identity of senders and receivers +- Authorization, i.e., access control to accounts +- Forwarding or tunneling to encrypt other TCP/IP-based sessions + +### Kerberos + +- According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving. +- So when it comes to Computer Science, Kerberos is a network authentication protocol, and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network. + +- Kerberos uses symmetric key cryptography and requires trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent: + - a client : A user/ a service + - a server : Kerberos protected hosts reside + + ![image10](images/image10.png) + - a Key Distribution Center (KDC), which acts as the trusted third-party authentication service. + +The KDC includes following two servers: + +- Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users. +- Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT). + + ![image11](images/image11.png) + + +### Certificate Chain + +The first part of the output of the OpenSSL command shows three certificates numbered 0, 1, and 2(not 2 anymore). Each certificate has a subject, s, and an issuer, i. The first certificate, number 0, is called the end-entity certificate. The subject line tells us it’s valid for any subdomain of google.com because its subject is set to *.google.com. + + +`$ openssl s_client -connect www.google.com:443 -CApath /etc/ssl/certs +CONNECTED(00000005) +depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign +verify return:1 +depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1 +verify return:1 +depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = www.google.com +verify return:1` +`--- +Certificate chain + 0 s:/C=US/ST=California/L=Mountain View/O=Google LLC/CN=www.google.com + i:/C=US/O=Google Trust Services/CN=GTS CA 1O1 + 1 s:/C=US/O=Google Trust Services/CN=GTS CA 1O1 + i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign +---` +`Server certificate` + +- The issuer line indicates it’s issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1 +- What the OpenSSL command line doesn’t show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on. +- The public certificate of GlobalSign Authority must be present in the system’s trust store to close the verification chain. This is called a chain of trust, and figure below summarizes its behavior at a high level. + + ![image122](images/image122.png) + +- High-level view of the concept of chain of trust applied to verifying the authenticity of a website. The Root CA in the Firefox trust store provides the initial trust to verify the entire chain and trust the end-entity certificate. + +### TLS Handshake + +1. The client sends a HELLO message to the server with a list of protocols and algorithms it supports. +2. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite. +3. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre master key with the Diffie-Hellman algorithm. The pre master key is never sent over the wire. +4. The client and server create a session key that will be used to encrypt the data transiting through the connection. + +At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key + +**NOTE** + +- There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2 +- TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks—such as BEAST and POODLE—for years, in addition to supporting weak cryptography, which doesn’t keep modern-day connections sufficiently secure. +- TLS 1.1 is the forgotten “middle child.” It also has bad cryptography like its younger sibling. In most software it was leapfrogged by TLS 1.2 and it’s rare to see TLS 1.1 used. + +### “Perfect” Forward Secrecy + +- The term “ephemeral” in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just “Forward Secrecy”. +- In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the server’s public key. The server then decrypts the pre-master key with its private key. If, at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker. +- An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key can’t be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy. +- Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. "Forward secrecy" means that just because I can decrypt block M, does not mean that I can decrypt block Q +- Downside: + - The downside to PFS is that all those extra computational steps induce latency on the handshake and slow the user down. To avoid repeating this expensive work at every connection, both sides cache the session key for future use via a technique called session resumption. This is what the session-ID and TLS ticket are for: they allow a client and server that share a session ID to skip over the negotiation of a session key, because they already agreed on one previously, and go directly to exchanging data securely. \ No newline at end of file diff --git a/courses/security/images/.DS_Store b/courses/security/images/.DS_Store new file mode 100644 index 0000000..6fdfab4 Binary files /dev/null and b/courses/security/images/.DS_Store differ diff --git a/courses/security/images/image1.png b/courses/security/images/image1.png new file mode 100644 index 0000000..59f80f1 Binary files /dev/null and b/courses/security/images/image1.png differ diff --git a/courses/security/images/image10.png b/courses/security/images/image10.png new file mode 100644 index 0000000..b781329 Binary files /dev/null and b/courses/security/images/image10.png differ diff --git a/courses/security/images/image11.png b/courses/security/images/image11.png new file mode 100644 index 0000000..5a660c3 Binary files /dev/null and b/courses/security/images/image11.png differ diff --git a/courses/security/images/image122.png b/courses/security/images/image122.png new file mode 100644 index 0000000..bb4a191 Binary files /dev/null and b/courses/security/images/image122.png differ diff --git a/courses/security/images/image14.png b/courses/security/images/image14.png new file mode 100644 index 0000000..bd23a37 Binary files /dev/null and b/courses/security/images/image14.png differ diff --git a/courses/security/images/image15.png b/courses/security/images/image15.png new file mode 100644 index 0000000..1896e90 Binary files /dev/null and b/courses/security/images/image15.png differ diff --git a/courses/security/images/image17.png b/courses/security/images/image17.png new file mode 100644 index 0000000..7a32100 Binary files /dev/null and b/courses/security/images/image17.png differ diff --git a/courses/security/images/image18.png b/courses/security/images/image18.png new file mode 100644 index 0000000..dbefcf9 Binary files /dev/null and b/courses/security/images/image18.png differ diff --git a/courses/security/images/image19.png b/courses/security/images/image19.png new file mode 100644 index 0000000..8bc9783 Binary files /dev/null and b/courses/security/images/image19.png differ diff --git a/courses/security/images/image20.png b/courses/security/images/image20.png new file mode 100644 index 0000000..34ac1c0 Binary files /dev/null and b/courses/security/images/image20.png differ diff --git a/courses/security/images/image22.png b/courses/security/images/image22.png new file mode 100644 index 0000000..056e7ea Binary files /dev/null and b/courses/security/images/image22.png differ diff --git a/courses/security/images/image23.png b/courses/security/images/image23.png new file mode 100644 index 0000000..2669d36 Binary files /dev/null and b/courses/security/images/image23.png differ diff --git a/courses/security/images/image26.png b/courses/security/images/image26.png new file mode 100644 index 0000000..1171a72 Binary files /dev/null and b/courses/security/images/image26.png differ diff --git a/courses/security/images/image5.png b/courses/security/images/image5.png new file mode 100644 index 0000000..b5f5adf Binary files /dev/null and b/courses/security/images/image5.png differ diff --git a/courses/security/images/image6.png b/courses/security/images/image6.png new file mode 100644 index 0000000..9c16b5b Binary files /dev/null and b/courses/security/images/image6.png differ diff --git a/courses/security/images/image7.png b/courses/security/images/image7.png new file mode 100644 index 0000000..f91bf57 Binary files /dev/null and b/courses/security/images/image7.png differ diff --git a/courses/security/images/image8.png b/courses/security/images/image8.png new file mode 100644 index 0000000..782d84a Binary files /dev/null and b/courses/security/images/image8.png differ diff --git a/courses/security/images/image9.png b/courses/security/images/image9.png new file mode 100644 index 0000000..69a2f4a Binary files /dev/null and b/courses/security/images/image9.png differ diff --git a/courses/security/intro.md b/courses/security/intro.md new file mode 100644 index 0000000..e05622e --- /dev/null +++ b/courses/security/intro.md @@ -0,0 +1,37 @@ +# Security + +## Prerequisites + +1. Basics of Linux fundamentals & command line usage + +2. Networking Module + + +## What to expect from this course + +The course covers fundamentals of information security along with touching on subjects of system security, network & web security. The aim of this course is to get familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured. + + +## What is not covered under this course + +The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don’t get into those situations and also to make you aware of different ways a system can be compromised. + + +## Course Content + +### Table of Contents + +1. [Fundamentals](https://linkedin.github.io/school-of-sre/security/fundamentals/) +2. [Network Security](https://linkedin.github.io/school-of-sre/security/network_security/) +3. [Threats, Attacks & Defence](https://linkedin.github.io/school-of-sre/security/threats_attacks_defences/) +4. [Writing Secure Code & More](https://linkedin.github.io/school-of-sre/security/writing_secure_code/) + + +## Post Training asks/ Further Reading + +- CTF Events like : +- Penetration Testing : +- Threat Intelligence : +- Threat Detection & Hunting : +- Web Security: +- Building Secure and Reliable Systems : diff --git a/courses/security/network_security.md b/courses/security/network_security.md new file mode 100644 index 0000000..cb41677 --- /dev/null +++ b/courses/security/network_security.md @@ -0,0 +1,494 @@ +# Part II : Network Security + +## Introduction + +- TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer. +- The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relation between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model. + + ![image14](images/image14.png) + Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms + +The functionalities of OSI layers are briefly described as follows: + +1. The application layer serves as an interface between applications and network programs. It supports application programs and end-user processing. Common application-layer programs include remote logins, file transfer, email, and Web browsing. +2. The presentation layer is responsible for dealing with data that is formed differently. This protocol layer allows application-layer programs residing on different sides of a communication channel with different platforms to understand each other's data formats regardless of how they are presented. +3. The session layer is responsible for creating, managing, and closing a communication connection. +4. The transport layer is responsible for providing reliable connections, such as packet sequencing, traffic control, and congestion control. +5. The network layer is responsible for routing device-independent data packets from the current hop to the next hop. +6. The data-link layer is responsible for encapsulating device-independent data packets into device-dependent data frames. It has two sublayers: logical link control and media access control. +7. The physical layer is responsible for transmitting device-dependent frames through some physical media. + +- Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer. +- Application data blocks are “dressed up” in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed to a sequence of media signals for transmission + + ![image15](images/image15.png) + Flow Diagram of a Packet Generation + +- At the destination side, the medium signals are converted by the physical layer into a frame, which is passed up to the data-link layer. The data-link layer passes the frame payload (i.e., the IP packet encapsulated in the frame) up to the IP layer. The IP layer passes the IP payload, namely, the TCP packet encapsulated in the IP packet, up to the TCP layer. The TCP layer passes the TCP payload, namely, the application data block, up to the application layer. When a packet arrives at a router, it only goes up to the IP layer, where certain fields in the IP header are modified (e.g., the value of TTL is decreased by 1). This modified packet is then passed back down layer-by-layer to the physical layer for further transmission. + +### Public Key Infrastructure + +- To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. In order to use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions: + - Determine the legitimacy of users before issuing public-key certificates to them. + - Issue public-key certificates upon user requests. + - Extend public-key certificates valid time upon user requests. + - Revoke public-key certificates upon users' requests or when the corresponding private keys are compromised. + - Store and manage public-key certificates. + - Prevent digital signature signers from denying their signatures. + - Support CA networks to allow different CAs to authenticate public-key certificates issued by other CAs. + - X.509: + +### IPsec: A Security Protocol at the Network Layer + +- IPsec is a major security protocol at the network layer +- IPsec provides a potent platform for constructing virtual private networks (VPN). VPNs are private networks overlayed on public networks. +- The purpose of deploying cryptographic algorithms at the network layer is to encrypt or authenticate IP packets (either just the payloads or the whole packets). +- IPsec also specifies how to exchange keys. Thus, IPsec consists of authentication protocols, encryption protocols, and key exchange protocols. They are referred to, respectively, as authentication header (AH), encapsulating security payload (ESP), and Internet key exchange (IKE). + +### PGP & S/MIME : Email Security + +- There are a number of security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME. +- SMTP (“Simple Mail Transfer Protocol”) is used for sending and delivering from a client to a server via port 25: it’s the outgoing server. On the contrary, POP (“Post Office Protocol”) allows the user to pick up the message and download it into his own inbox: it’s the incoming server. The latest version of the Post Office Protocol is named POP3, and it’s been used since 1996; it uses port 110 + +PGP + +- PGP implements all major cryptographic algorithms, the ZIP compression algorithm, and the Base64 encoding algorithm. +- It can be used to authenticate a message, encrypt a message, or both. PGP follows the following general process: authentication, ZIP compression, encryption, and Base64 encoding. +- The Base64 encoding procedure makes the message ready for SMTP transmission + +GPG (GnuPG) + +- GnuPG is another free encryption standard that companies may use that is based on OpenPGP. +- GnuPG serves as a replacement for Symantec’s PGP. +- The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantec’s PGP. +- It is important to note that there are some nuances between compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isn’t included in GnuPG out of the box due to patent issues. + +S/MIME + +- SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate this limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the user to read his messages from multiple computers. +- The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats. +- The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the user to access his mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server. + +SSL/TLS + +- SSL uses a PKI to decide if a server’s public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA. +- When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation. +- The server’s public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then. +- Netscape’s intent was to train users(though this didn’t work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. You’re obviously familiar with this icon as it’s been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications. +- A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. In an effort to standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol. + +- Must See: + - + - + +## Network Perimeter Security + +Let us see how we keep a check on the perimeter i.e the edges, the first layer of protection + +### General Firewall Framework + +- Firewalls are needed because encryption algorithms cannot effectively stop malicious packets from getting into an edge network. +- This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network. +- Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both. +- Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network. +- The packet inspection that is carried out by firewalls can be done using several different methods. On the basis of the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter. + +### Packet Filters + +- It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network +- Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer +- A packet filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through. +- 2 types: + - Stateless + - It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected. + + - Stateful + - Stateful filtering, also referred to as connection-state filtering, keeps track of connections between an internal host and an external host. A connection state (or state, for short) indicates whether it is a TCP connection or a UDP connection and whether the connection is established. + +### Circuit Gateways + +- Circuit gateways, also referred to as circuit-level gateways, are typically operated at the transportation layer +- They evaluate the information of the IP addresses and the port numbers contained in TCP (or UDP) headers and use it to determine whether to allow or to disallow an internal host and an external host to establish a connection. +- It is common practice to combine packet filters and circuit gateways to form a dynamic packet filter (DPF). + +### Application Gateways(ALG) + +- Aka PROXY Servers +- An Application Level Gateway (ALG) acts like a proxy for internal hosts, processing service requests from external clients. +- An ALG performs deep inspections on each IP packet (ingress or egress). +- In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted. + - Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads. + +### Trusted Systems & Bastion Hosts + +- A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on a number of elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied: + - Its system design contains no defects; + - Its system software contains no loopholes; + - Its system is configured properly; and + - Its system management is appropriate. + +- Bastion Hosts + - Bastion hosts are computers with strong defense mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are absolutely necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host. + - Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely. + +--- + +## Common Techniques & Scannings, Packet Capturing + +### Scanning Ports with Nmap + +- Nmap ("Network Mapper") is a free and open source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. +- The best thing about Nmap is it’s free and open source and is very flexible and versatile +- Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port. +- More at http://scanme.nmap.org/ + +``` +nmap [scan type] [options] [target specification] +``` + + +Nmap uses 6 different port states: + +- **Open** — An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network. +- **Closed** — A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection. +- **Filtered** — Nmap can’t determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable. +- **Unfiltered** — Port is accessible but Nmap doesn’t know if its open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open. +- **Open/filtered** — Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked. +- **Closed/filtered** — Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan. + +### Types of Nmap Scan: + +1. TCP Connect + - TCP Connect scan completes the 3-way handshake. + - If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is “noisy” because the services can log the sender IP address and might trigger Intrusion Detection Systems. +2. UDP Scan + - This scan checks to see if there are any UDP ports listening. + - Since UDP does not respond with a positive acknowledgment like TCP and only responds to an incoming UDP packet when the port is closed, + +3. SYN Scan + - SYN scan is another form of TCP scanning. + - This scan type is also known as “half-open scanning” because it never actually opens a full TCP connection. + - The port scanner generates a SYN packet. If the target port is open, it will respond with an SYN-ACK packet. The scanner host responds with an RST packet, closing the connection before the handshake is completed. + - If the port is closed but unfiltered, the target will instantly respond with an RST packet. + - SYN scan has the advantage that the individual services never actually receive a connection. + +4. FIN Scan + - This is a stealthy scan, like the SYN scan, but sends a TCP FIN packet instead. + +5. ACK Scan + - Ack scanning determines whether the port is filtered or not. +6. Null Scan + - Another very stealthy scan that sets all the TCP header flags to off or null. + - This is not normally a valid packet and some hosts will not know what to do with this. +7. XMAS Scan + - Similar to the NULL scan except for all the flags in the TCP header is set to on +8. RPC Scan + - This special type of scan looks for machine answering to RPC (Remote Procedure Call) services +9. IDLE Scan + - It is a super stealthy method whereby the scan packets are bounced off an external host. + - You don’t need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our “zombie” host and what port number to use. It is one of the more controversial options in Nmap since it really only has a use for malicious attacks. + +Scan Techniques + +A couple of scan techniques which can be used to gain more information about a system and its ports. You can read more at + +### OpenVAS + +- OpenVAS is a full-featured vulnerability scanner. +- OpenVAS is a framework of services and tools that provides a comprehensive and powerful vulnerability scanning and management package +- OpenVAS, which is an open-source program, began as a fork of the once-more-popular scanning program, Nessus. +- OpenVAS is made up of three main parts. These are: + - a regularly updated feed of Network Vulnerability Tests (NVTs); + - a scanner, which runs the NVTs; and + - a SQLite 3 database for storing both your test configurations and the NVTs’ results and configurations. + - + +### WireShark + +- Wireshark is a protocol analyzer. +- This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols. +- Wireshark understands protocol sequences. + +A simple demo of wireshark + +1. Capture only udp packets: + - Capture filter = “udp” + +2. Capture only tcp packets + - Capture filter = “tcp” + +3. TCP/IP 3 way Handshake + ![image17](images/image17.png) + +4. Filter by IP address: displays all traffic from IP, be it source or destination + - ip.addr == 192.168.1.1 +5. Filter by source address: display traffic only from IP source + - ip.src == 192.168.0.1 + +6. Filter by destination: display traffic only form IP destination + - ip.dst == 192.168.0.1 + +7. Filter by IP subnet: display traffic from subnet, be it source or destination + - ip.addr = 192.168.0.1/24 + +8. Filter by protocol: filter traffic by protocol name + - dns + - http + - ftp + - arp + - ssh + - telnet + - icmp + +9. Exclude IP address: remove traffic from and to IP address + - !ip.addr ==192.168.0.1 + +10. Display traffic between two specific subnet + - ip.addr == 192.168.0.1/24 and ip.addr == 192.168.1.1/24 + +11. Display traffic between two specific workstations + - ip.addr == 192.168.0.1 and ip.addr == 192.168.0.2 +12. Filter by MAC + - eth.addr = 00:50:7f:c5:b6:78 + +13. Filter TCP port + - tcp.port == 80 +14. Filter TCP port source + - tcp.srcport == 80 +15. Filter TCP port destination + - tcp.dstport == 80 +16. Find user agents + - http.user_agent contains Firefox + - !http.user_agent contains || !http.user_agent contains Chrome +17. Filter broadcast traffic + - !(arp or icmp or dns) +18. Filter IP address and port + - tcp.port == 80 && ip.addr == 192.168.0.1 + +19. Filter all http get requests + - http.request +20. Filter all http get requests and responses + - http.request or http.response +21. Filter three way handshake + - tcp.flags.syn==1 or (tcp.seq==1 and tcp.ack==1 and tcp.len==0 and tcp.analysis.initial_rtt) +22. Find files by type + - frame contains “(attachment|tar|exe|zip|pdf)” +23. Find traffic based on keyword + - tcp contains facebook + - frame contains facebook +24. Detecting SYN Floods + - tcp.flags.syn == 1 and tcp.flags.ack == 0 + +**Wireshark Promiscuous Mode** + - By default, Wireshark only captures packets going to and from the computer where it runs. By checking the box to run Wireshark in Promiscuous Mode in the Capture Settings, you can capture most of the traffic on the LAN. + +### DumpCap + +- Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcap’s native capture file format is pcapng, which is also the format used by Wireshark. +- By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets’ time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library. +- The Wireshark command line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time. +- Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods of time. + +### DaemonLogger + +- Daemonlogger is a packet logging application designed specifically for use in Network and Systems Management (NSM) environments. +- The biggest benefit Daemonlogger provides is that, like Dumpcap, it is simple to use for capturing packets. In order to begin capturing, you need only to invoke the command and specify an interface. + - daemonlogger –i eth1 + - This option, by default, will begin capturing packets and logging them to the current working directory. + - Packets will be collected until the capture file size reaches 2 GB, and then a new file will be created. This will continue indefinitely until the process is halted. + +### NetSniff-NG + +- Netsniff-NG is a high-performance packet capture utility +- While the utilities we’ve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links. +- In order to begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk. + + `netsniff-ng –i eth1 –o data.pcap` + +### Netflow + +- NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup (using NetFlow) consists of three main components:[1] + + - Flow exporter: aggregates packets into flows and exports flow records towards one or more flow collectors. + - Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter. + - Analysis application: analyzes received flow data in the context of intrusion detection or traffic profiling, for example. + - Routers and switches that support NetFlow can collect IP traffic statistics on all interfaces where NetFlow is enabled, and later export those statistics as NetFlow records toward at least one NetFlow collector—typically a server that does the actual traffic analysis. + +### IDS + +A security solution that detects security-related events in your environment but does not block them. +IDS sensors can be software and hardware based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS. + +- A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system. +- A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic. + +Signature Based IDS + +- The signature-based IDS monitors the network traffic or observes the system and sends an alarm if a known malicious event is happening. +- It does so by comparing the data flow against a database of known attack patterns +- These signatures explicitly define what traffic or activity should be considered as malicious. +- Signature-based detection has been the bread and butter of network-based defensive security for over a decade, partially because it is very similar to how malicious activity is detected at the host level with antivirus utilities +- The formula is fairly simple: an analyst observes a malicious activity, derives indicators from the activity and develops them into signatures, and then those signatures will alert whenever the activity occurs again. + +- ex: SNORT & SURICATA + +Policy Based IDS + +- The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy. +- This configured policy is or should be a representation of the security policies. +- This type of IDS is flexible and can be customized to a company's network requirements because it knows exactly what is permitted and what is not. +- On the other hand, the signature-based systems rely on vendor specifics and default settings. + + +Anomaly Based IDS + +- The anomaly-based IDS looks for traffic that deviates from the normal, but the definition of what is a normal network traffic pattern is the tricky part +- Two types of anomaly-based IDS exist: statistical and nonstatistical anomaly detection + - Statistical anomaly detection learns the traffic patterns interactively over a period of time. + - In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns. + +Host Based IDS & Network Based IDS + +- A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system. + +- Network IDSs, on the other hand, can be described as intelligent sniffing devices. Data (raw packets) is captured from the network by a network IDS, whereas host IDSs capture the data from the host on which they are installed. + +Honeypots + +- The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviors is referred to as a honeypot. +- A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs. + +--- + +## Chinks In The Armour (TCP/IP Security Issues) + +![image18](images/image18.png) + +### IP Spoofing + +- In this type of attack, the attacker replaces the IP address of the sender, or in some rare cases the destination, with a different address. +- IP spoofing is normally used to exploit a target host. In other cases, it is used to start a denial-of-service (DoS) attack. + - In a DoS attack, an attacker modifies the IP packet to mislead the target host into accepting the original packet as a packet sourced at a trusted host. The attacker must know the IP address of the trusted host to modify the packet headers (source IP address) so that it appears that the packets are coming from that host. + +IP Spoofing Detection Techniques + +- Direct TTL Probes + - In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compare TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet. + + - This Technique is successful when the attacker is in a different subnet from the victim. + ![image19](images/image19.png) + +- IP Identification Number. + - Send a probe to the host of suspect spoofed traffic that triggers a reply and compare IP ID with suspect traffic. + - If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed + +- TCP Flow Control Method + - Attackers sending spoofed TCP packets will not receive the target’s SYN-ACK packets. + - Attackers cannot therefore be responsive to change in the congestion window size + - When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed. + +### Covert Channel + +- A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications. +- More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems. + - Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behavior. The alteration of ICMP packets gives intruders the opportunity to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator. + - ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996. + + - Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunneling. + - Check for echo responses that do not contain the same payload as request + - Check for volume of ICMP traffic specially for volumes beyond an acceptable threshold + +### IP Fragmentation Attack + +- The TCP/IP protocol suite, or more specifically IP, allows the fragmentation of packets.(this is a feature & not a bug) +- IP fragmentation offset is used to keep track of the different parts of a datagram. +- The information or content in this field is used at the destination to reassemble the datagrams +- All such fragments have the same Identification field value, and the fragmentation offset indicates the position of the current fragment in the context of the original packet. + +- Many access routers and firewalls do not perform packet reassembly. In normal operation, IP fragments do not overlap, but attackers can create artificially fragmented packets to mislead the routers or firewalls. Usually, these packets are small and almost impractical for end systems because of data and computational overhead. +- A good example of an IP fragmentation attack is the Ping of Death attack. The Ping of Death attack sends fragments that, when reassembled at the end station, create a larger packet than the maximum permissible length. + +TCP Flags + +- Data exchange using TCP does not happen until a three-way handshake has been successfully completed. This handshake uses different flags to influence the way TCP segments are processed. +- There are 6 bits in the TCP header that are often called flags. Namely: + - 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and sender is finished with this connection (FIN). + ![image20](images/image20.png) + + - Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang. + +``` +| SYN | FIN | PSH | RST | Validity| +|------|------|-------|------|---------| +| 1 |1 |0 |0 |Illegal Combination +| 1 |1 |1 |0 |Illegal Combination +| 1 |1 |0 |1 |Illegal Combination +| 1 |1 |1 |1 |Illegal Combination +``` + +- The attacker's ultimate goal is to write special programs or pieces of code that are able to construct these illegal combinations resulting in an efficient DoS attack. + +SYN FLOOD + +- The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems. +- After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the web server of Company XYZ (almost certainly with a spoofed IP address). +- The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the web server. Multiple packets cause multiple TCP sessions to stay open. +- Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the web server refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established. + +FIN Attack + +- In normal operation, the sender sets the TCP FIN flag indicating that no more data will be transmitted and the connection can be closed down. +- This is a four-way handshake mechanism, with both sender and receiver expected to send an acknowledgement on a received FIN packet. + +- During an attack that is trying to kill connections, a spoofed FIN packet is constructed. This packet also has the correct sequence number, so the packets are seen as valid by the targeted host. These sequence numbers are easy to predict. This process is referred to as TCP sequence number prediction, whereby the attacker either sniffs the current Sequence and Acknowledgment (SEQ/ACK) numbers of the connection or can algorithmically predict these numbers. + +### Connection Hijacking + + ![image22](images/image22.png) + +- An authorized user (Employee X) sends HTTP requests over a TCP session with the web server. +- The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the web server to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number. +- In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the web server. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the web server replies assuming the cracker is sending correct synchronized data. + +STEPS: + +1. The attacker examines the traffic flows with a network monitor and notices traffic from Employee X to a web server. +2. The web server returns or echoes data back to the origination station (Employee X). +3. Employee X acknowledges the packet. +4. The cracker launches a spoofed packet to the server. +5. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X. +6. The cracker can start sending traffic to the web server. +7. The web server returns the requested data to confirm delivery with the correct ACK number. +8. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session. + +### Buffer Overflow + +- A buffer is a temporary data storage area used to store program code and data. +- When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs. +- Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that are able to store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them. + + +Mechanism: + +- Buffer overflow vulnerabilities exist in different types. But the overall goal for all buffer overflow attacks is to take over the control of a privileged program and, if possible, the host. The attacker has two tasks to achieve this goal. First, the dirty code needs to be available in the program's code address space. Second, the privileged program should jump to that particular part of the code, which ensures that the proper parameters are loaded into memory. +- The first task can be achieved in two ways: by injecting the code in the right address space or by using the existing code and modifying certain parameters slightly. The second task is a little more complex because the program's control flow needs to be modified to make the program jump to the dirty code. + + +CounterMeasure: + +- The most important approach is to have a concerted focus on writing correct code. +- A second method is to make the data buffers (memory locations) address space of the program code non executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack. + +### More Spoofing + +Address Resolution Protocol Spoofing + +- The Address Resolution Protocol (ARP) provides a mechanism to resolve, or map, a known IP address to a MAC sublayer address. +- Using ARP spoofing, the cracker can exploit this hardware address authentication mechanism by spoofing the hardware address of Host B. Basically, the attacker can convince any host or network device on the local network that the cracker's workstation is the host to be trusted. This is a common method used in a switched environment. + - ARP spoofing can be prevented with the implementation of static ARP tables in all the hosts and routers of your network. Alternatively, you can implement an ARP server that responds to ARP requests on behalf of the target host. + +DNS Spoofing + +- DNS spoofing is the method whereby the hacker convinces the target machine that the system it wants to connect to is the machine of the cracker. +- The cracker modifies some records so that name entries of hosts correspond to the attacker's IP address. There have been instances in which the complete DNS server was compromised by an attack. +- To counter DNS spoofing, the reverse lookup detects these attacks. The reverse lookup is a mechanism to verify the IP address against a name. The IP address and name files are usually kept on different servers to make compromise much more difficult diff --git a/courses/security/threats_attacks_defences.md b/courses/security/threats_attacks_defences.md new file mode 100644 index 0000000..593cb5c --- /dev/null +++ b/courses/security/threats_attacks_defences.md @@ -0,0 +1,218 @@ +# Part III: Threats, Attacks & Defense + +## DNS Protection + +### Cache Poisoning Attack + +- Since DNS responses are cached, a quick response can be provided for repeated translations. +DNS negative queries are also cached, e.g., misspelled words, and all cached data periodically times out. +Cache poisoning is an issue in what is known as pharming. This term is used to describe a hacker’s attack in which a website’s traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS. +If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server. +the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack. + +### DNSSEC (Security Extension) + +- The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in a response is equal to the data entered by the zone administrator +- DNS Security Extensions (DNSSEC) protects against data spoofing and corruption, and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity. +- When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy. +- The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection. + +### BGP + +- BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS) + - An Autonomous system is a collection of routers or networks with the same network policy usually under a single administrative control. +- BGP tells routers which hop to use in order to reach the destination network. +- BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior). + + ![image23](images/image23.png) + +## How BGP Works + +- BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one. +- This decision is done through a protocol known as Link state. With the link state protocol each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next hop routing table is based on this topology view. +- The link state protocol uses a famous algorithm in the field of computer science, Dijkstra’s shortest path algorithm: + - We start from our router considering the path cost to all our direct neighbors. + - The shortest path is then taken + - We then re-look at all our neighbors that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited. + +## BGP Vulnerabilities + +- By corrupting the BGP routing table we are able to influence the direction traffic flows on the internet! This action is known as BGP hijacking. +- Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations. +- Blackholing traffic: + - Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic. + - Blackhole routes are best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control masters. + - Infamous BGP Injection attack on Youtube + +- EX: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube. +- Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic. +- Refer : +- Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers, since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases packets may not be delivered at all. + +BGP Security + +- Border Gateway Protocol Security recommends the use of BGP peer authentication, since it is one of the strongest mechanisms for preventing malicious activity. + - The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5. +- Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators, when a neighbor sends in excess of a preset number of prefixes. +- IETF is currently working on improving this space + +## Web Based Attacks + +### HTTP Response Splitting Attacks + +- HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation. +- This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie. +- HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks. +- HTTP response splitting is the attacker’s ability to send a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. + +### Cross-Site Request Forgery (CSRF or XSRF) + +- A Cross-Site Request Forgery attack tricks the victim’s browser into issuing a command to a vulnerable web application. +- Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc with each request. +- Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details. +- The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack +- All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites. +- OWASP recommends website developers include a required security token in HTTP requests associated with sensitive business functions in order to mitigate CSRF attacks + +### Cross-Site Scripting (XSS) Attacks + +- Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site. +- If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end user systems. +- Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victim’s browser. The victim is simply a user’s host and not the server. XSS results from a failure to validate user input by a web-based application. + +### Document Object Model (DOM) XSS Attacks + +- The Document Object Model (DOM) based XSS does not require the web server to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker. +- When the page is rendered and the data is processed by the page, typically by a client side HTML-embedded script such as JavaScript, the page’s code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser. + +### Clickjacking + +- The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site. +- Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see, is actually being duped into visiting a malicious page or executing a malicious script. +- When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into “liking” a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010. +- There is not yet effective defense against clickjacking, and disabling JavaScript is the only viable method + +## DataBase Attacks & Defenses + +### SQL injection Attacks + +- It exploits improper input validation in database queries. +- A successful exploit will allow attackers to access, modify, or delete information in the database. +- It permits attackers to steal sensitive information stored within the backend databases of affected websites, which may include such things as user credentials, email addresses, personal information, and credit card numbers + +``` +SELECT USERNAME,PASSWORD from USERS where USERNAME='' AND PASSWORD=''; + +Here the username & password is the input provided by the user. Suppose an attacker gives the input as " OR '1'='1'" in both fields. Therefore the SQL query will look like: + +SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1'; + +This query results in a true statement & user gets logged in. This example depicst the bost basic type of SQL injection +``` + + +### SQL Injection Attack Defenses + +- SQL injection can be protected by filtering the query to eliminate malicious syntax, which involves the employment of some tools in order to (a) scan the source code. +- In addition, the input fields should be restricted to the absolute minimum, typically anywhere from 7-12 characters, and validate any data, e.g., if a user inputs an age make sure the input is an integer with a maximum of 3 digits. + +## VPN + +A virtual private network (VPN) is a service that offers a secure, reliable connection over a shared public infrastructure such as the Internet. Cisco defines a VPN as an encrypted connection between private networks over a public network. To date, there are three types of VPNs: + +- Remote access +- Site-to-site +- Firewall-based + +## Security Breach + +In spite of the most aggressive steps to protect computers from attacks, attackers sometimes get through. Any event that results in a violation of any of the confidentiality, integrity, or availability (CIA) security tenets is a security breach. + +### Denial of Service Attacks + +- Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations +- Two common types of DoS attacks are as follows: + - Logic attacks—Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date. + - Flooding attacks—Flooding attacks overwhelm the victim computer’s CPU, memory, or network resources by sending large numbers of useless requests to the machine. +- Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw +- One popular technique for launching a packet flood is a SYN flood. +- One of the best defenses against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack. + +### Distributed Denial of Service Attacks + +- DDoS attacks differ from regular DoS attacks in their scope. In a DDoS attack, attackers hijack hundreds or even thousands of Internet computers, planting automated attack agents on those systems. The attacker then instructs the agents to bombard the target site with forged messages. This overloads the site and blocks legitimate traffic. The key here is strength in numbers. The attacker does more damage by distributing the attack across multiple computers. + + +### Wiretapping + +- Although the term wiretapping is generally associated with voice telephone communications, attackers can also use wiretapping to intercept data communications. + +- Attackers can tap telephone lines and data communication lines. Wiretapping can be active, where the attacker makes modifications to the line. It can also be passive, where an unauthorized user simply listens to the transmission without changing the contents. Passive intrusion can include the copying of data for a subsequent active attack. +- Two methods of active wiretapping are as follows: + - Between-the-lines wiretapping—This type of wiretapping does not alter the messages sent by the legitimate user but inserts additional messages into the communication line when the legitimate user pauses. + - Piggyback-entry wiretapping—This type of wiretapping intercepts and modifies the original message by breaking the communications line and routing the message to another computer that acts as a host. + +### Backdoors + +- Software developers sometimes include hidden access methods, called backdoors, in their programs. Backdoors give developers or support personnel easy access to a system without having to struggle with security controls. The problem is that backdoors don’t always stay hidden. When an attacker discovers a backdoor, he or she can use it to bypass existing security controls such as passwords, encryption, and so on. Where legitimate users log on through front doors using a user ID and password, attackers use backdoors to bypass these normal access controls. + +## Malicious Attacks + +### Birthday Attack + +- Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory. +- Further Reading: + - + - + +### Brute-Force Password Attacks + +- In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved—just brute force that eventually breaks the code. +- Further Reading: + - + - + +### Dictionary Password Attacks + +- A dictionary password attack is a simple attack that relies on users making poor password choices. In a dictionary password attack, a simple password-cracker program takes all the words from a dictionary file and attempts to log on by entering each dictionary entry as a password. +- Further Reading: +https://capec.mitre.org/data/definitions/16.html + +### Replay Attacks + +- Replay attacks involve capturing data packets from a network and retransmitting them to produce an unauthorized effect. The receipt of duplicate, authenticated IP packets may disrupt service or have some other undesired consequence. Systems can be broken through replay attacks when attackers reuse old messages or parts of old messages to deceive system users. This helps intruders to gain information that allows unauthorized access into a system. +- Further reading: + + +### Man-in-the-Middle Attacks + +- A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination. +- Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the web server. The attacker then establishes a secure connection with the web server, acting as an invisible go-between. The attacker passes traffic between the user and the web server. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data. +- Further Reading: + - + +### Masquerading + +- In a masquerade attack, one user or computer pretends to be another user or computer. Masquerade attacks usually include one of the other forms of active attacks, such as IP address spoofing or replaying. Attackers can capture authentication sequences and then replay them later to log on again to an application or operating system. For example, an attacker might monitor usernames and passwords sent to a weak web application. The attacker could then use the intercepted credentials to log on to the web application and impersonate the user. +- Further Reading: + + +### Eavesdropping + +- Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(ofcourse given some conditions) given sec, even if the packet’s address doesn’t match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods. + +### Social Engineering + +- Attackers often use a deception technique called social engineering to gain access to resources in an IT infrastructure. In nearly all cases, social engineering involves tricking authorized users into carrying out actions for unauthorized users. The success of social engineering attacks depends on the basic tendency of people to want to be helpful. + +### Phreaking + +- Phone phreaking, or simply phreaking, is a slang term that describes the activity of a subculture of people who study, experiment with, or explore telephone systems, telephone company equipment, and systems connected to public telephone networks. Phreaking is the art of exploiting bugs and glitches that exist in the telephone system. + +### Phishing + +- Phishing is a type of fraud in which an attacker attempts to trick the victim into providing private information such as credit card numbers, passwords, dates of birth, bank account numbers, automated teller machine (ATM) PINs, and Social Security numbers. + +### Pharming + +- Pharming is another type of attack that seeks to obtain personal or private financial information through domain spoofing. A pharming attack doesn’t use messages to trick victims into visiting spoofed websites that appear legitimate, however. Instead, pharming “poisons” a domain name on the domain name server (DNS), a process known as DNS poisoning. The result is that when a user enters the poisoned server’s web address into his or her address bar, that user navigates to the attacker’s site. The user’s browser still shows the correct website, which makes pharming difficult to detect—and therefore more serious. Where phishing attempts to scam people one at a time with an email or instant message, pharming enables scammers to target large groups of people at one time through domain spoofing. diff --git a/courses/security/writing_secure_code.md b/courses/security/writing_secure_code.md new file mode 100644 index 0000000..8fa3244 --- /dev/null +++ b/courses/security/writing_secure_code.md @@ -0,0 +1,56 @@ +# PART IV: Writing Secure Code & More + +The first and most important step in reducing security and reliability issues is to educate developers. However, even the best-trained engineers make mistakes, security experts can write insecure code and SREs can miss reliability issues. It’s difficult to keep the many considerations and tradeoffs involved in building secure and reliable systems in mind simultaneously, especially if you’re also responsible for producing software. + +## Use frameworks to enforce security and reliability while writing code + +- A better approach is to handle security and reliability in common frameworks, languages, and libraries. Ideally, libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible. +- Multiple applications can use each library or framework. When domain experts fix an issue, they remove it from all the applications the framework supports, allowing this engineering approach to scale better. + +## Common Security Vulnerabilities + +- In large codebases, a handful of classes account for the majority of security vulnerabilities, despite ongoing efforts to educate developers and introduce code review. OWASP and SANS publish lists of common vulnerability classes + + ![image26](images/image26.png) + +## Write Simple Code + + Try to keep your code clean and simple. + +### Avoid Multi Level Nesting + +- Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests don’t always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error). + +### Eliminate YAGNI Smells + +- Sometimes developers overengineer solutions by adding functionality that may be useful in the future, “just in case.” This goes against the YAGNI (You Aren’t Gonna Need It) principle, which recommends implementing only the code that you need. YAGNI code adds unnecessary complexity because it needs to be documented, tested, and maintained. +- To summarize, avoiding YAGNI code leads to improved reliability, and simpler code leads to fewer security bugs, fewer opportunities to make mistakes, and less developer time spent maintaining unused code. + +### Repay Technical Debt + +- It is a common practice for developers to mark places that require further attention with TODO or FIXME annotations. In the short term, this habit can accelerate the delivery velocity for the most critical functionality, and allow a team to meet early deadlines—but it also incurs technical debt. Still, it’s not necessarily a bad practice, as long as you have a clear process (and allocate time) for repaying such debt. + +### Refactoring + +- Refactoring is the most effective way to keep a codebase clean and simple. Even a healthy codebase occasionally needs to be +- Regardless of the reasons behind refactoring, you should always follow one golden rule: never mix refactoring and functional changes in a single commit to the code repository. Refactoring changes are typically significant and can be difficult to understand. +- If a commit also includes functional changes, there’s a higher risk that an author or reviewer might overlook bugs. + +### Unit Testing + +- Unit testing can increase system security and reliability by pinpointing a wide range of bugs in individual software components before a release. This technique involves breaking software components into smaller, self-contained “units” that have no external dependencies, and then testing each unit. + +### Fuzz Testing + +- Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzz engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example file parsers, compression algo, network protocol implementation and audio codec. + +### Integration Testing + +- Integration testing moves beyond individual units and abstractions, replacing fake or stubbed-out implementations of abstractions like databases or network services with real implementations. As a result, integration tests exercise more complete code paths. Because you must initialize and configure these other dependencies, integration testing may be slower and flakier than unit testing—to execute the test, this approach incorporates real-world variables like network latency as services communicate end-to-end. As you move from testing individual low-level units of code to testing how they interact when composed together, the net result is a higher degree of confidence that the system is behaving as expected. + +### Last But not the least + +- Code Reviews +- Rely on Automation +- Don’t check in Secrets +- Verifiable Builds diff --git a/courses/systems_design/availability.md b/courses/systems_design/availability.md index 48c1ad8..a4f9c65 100644 --- a/courses/systems_design/availability.md +++ b/courses/systems_design/availability.md @@ -1,4 +1,4 @@ -## HA - Availability - Common “Nines” +# HA - Availability - Common “Nines” Availability is generally expressed as “Nines”, common ‘Nines’ are listed below. | Availability % | Downtime per year | Downtime per month | Downtime per week | Downtime per day | diff --git a/courses/systems_design/conclusion.md b/courses/systems_design/conclusion.md index 9c9f3ba..5d182ac 100644 --- a/courses/systems_design/conclusion.md +++ b/courses/systems_design/conclusion.md @@ -1,3 +1,3 @@ -## Conclusion +# Conclusion Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can split code by services, shard data to scale. As the organisation matures, bringing in [chaos engineering](https://en.wikipedia.org/wiki/Chaos_engineering) to measure how systems react to failure will help in designing robust software systems. diff --git a/courses/systems_design/fault-tolerance.md b/courses/systems_design/fault-tolerance.md index d33003a..bc97d45 100644 --- a/courses/systems_design/fault-tolerance.md +++ b/courses/systems_design/fault-tolerance.md @@ -1,4 +1,4 @@ -## Fault Tolerance +# Fault Tolerance Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. diff --git a/courses/systems_design/intro.md b/courses/systems_design/intro.md index 4f74620..be222b2 100644 --- a/courses/systems_design/intro.md +++ b/courses/systems_design/intro.md @@ -1,27 +1,30 @@ # Systems Design -## Pre - Requisites +## Prerequisites Fundamentals of common software system components: - Operating Systems - Networking - Databases RDBMS/NoSQL -## What to expect from this training +## What to expect from this course Thinking about and designing for scalability, availability, and reliability of large scale software systems. -## What is not covered under this training +## What is not covered under this course Individual software components’ scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability and reliability of systems -## Training Content -- Introduction -- Scalability -- High Availability -- Fault Tolerance +## Course Content + +### Table of Contents + +- [Introduction](https://linkedin.github.io/school-of-sre/systems_design/intro/#backstory) +- [Scalability](https://linkedin.github.io/school-of-sre/systems_design/scalability/) +- [High Availability](https://linkedin.github.io/school-of-sre/systems_design/availability/) +- [Fault Tolerance](https://linkedin.github.io/school-of-sre/systems_design/fault-tolerance/) ## Introduction diff --git a/img/sos.png b/img/sos.png new file mode 100644 index 0000000..584c1b4 Binary files /dev/null and b/img/sos.png differ diff --git a/mkdocs.yml b/mkdocs.yml index f7a97e9..b87e7c5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,19 +1,45 @@ -site_name: school_of_sre +site_name: SchoolOfSRE docs_dir: courses +theme: + name: material + logo: img/sos.png + favicon: img/favicon.ico +plugins: [] nav: - Home: index.md -- Git: - - Git Basics: git/git-basics.md - - Working With Branches: git/branches.md - - Github and Hooks: git/github-hooks.md +- Fundamentals Series: + - Linux Basics: + - Introduction: linux_basics/intro.md + - Command Line Basics: linux_basics/command_line_basics.md + - Server Administration: linux_basics/linux_server_administration.md + - Git: + - Git Basics: git/git-basics.md + - Working With Branches: git/branches.md + - Github and Hooks: git/github-hooks.md + - Linux Networking: + - Introduction: linux_networking/intro.md + - DNS: linux_networking/dns.md + - UDP: linux_networking/udp.md + - HTTP: linux_networking/http.md + - TCP: linux_networking/tcp.md + - Routing: linux_networking/ipr.md + - Conclusion: linux_networking/conclusion.md - Python and Web: - - Intro: python_web/intro.md + - Introduction: python_web/intro.md - Some Python Concepts: python_web/python-concepts.md - Python, Web and Flask: python_web/python-web-flask.md - The URL Shortening App: python_web/url-shorten-app.md - - SRE Aspects of The App and Conclusion: python_web/sre-conclusion.md + - Conclusion: python_web/sre-conclusion.md +- Data: + - Big Data: + - Introduction: big_data/intro.md + - Overview of Big Data: big_data/overview.md + - Usage of Big Data techniques: big_data/usage.md + - Evolution of Hadoop: big_data/evolution.md + - Architecture of Hadoop: big_data/architecture.md + - Tasks and conclusion: big_data/tasks.md - Systems Design: - - Intro: systems_design/intro.md + - Introduction: systems_design/intro.md - Scalability: systems_design/scalability.md - Availability: systems_design/availability.md - Fault Tolerance: systems_design/fault-tolerance.md @@ -22,3 +48,10 @@ nav: - Introduction, Overview and Usage: big_data/intro.md - Evolution and Architecture of Hadoop: big_data/evolution.md - Tasks and conclusion: big_data/tasks.md +- Security: + - Introduction: security/intro.md + - Fundamentals of Security: security/fundamentals.md + - Network Security: security/network_security.md + - Threat, Attacks & Defences: security/threats_attacks_defences.md + - Writing Secure code: security/writing_secure_code.md +- Contribute: CONTRIBUTING.md