Merge pull request #2 from linkedin/main

Merging into main from upstream
This commit is contained in:
Aditya Kamat
2020-11-24 12:50:35 +05:30
committed by GitHub
124 changed files with 824 additions and 243 deletions

View File

@@ -1,78 +0,0 @@
# Architecture of Hadoop
1. **HDFS**
1. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.
2. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
3. HDFS is part of the Apache Hadoop Core project.
![HDFS Architecture](images/hdfs_architecture.png)
1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories.
2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and write requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.
3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes.
2. **YARN**
1. YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing.
2. The main components of YARN architecture include:
![YARN Architecture](images/yarn_architecture.gif)
1. Client: It submits map-reduce jobs to the resource manager.
2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:
3. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources.
4. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails.
5. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Node Manager. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it on the request of the Application master.
6. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
7. Container: It is a collection of physical resources such as RAM, CPU cores and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies etc.
# MapReduce framework
![MapReduce Framework](images/map_reduce.jpg)
1. The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key value pairs. Reduce job takes the output of the Map job i.e. the key value pairs and aggregates them to produce desired results.
2. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.
3. Please find the below Word count example demonstrating the usage of MapReduce framework:
![Word Count Example](images/mapreduce_example.jpg)
# Other tooling around hadoop
1. **Hive**
1. Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce.
2. Ex. HQL query:
1. _SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name);_
3. In mysql:
1. _SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name;_
2. **Pig**
1. Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce.
2. Here is a quick question for you:
What is the output of running the pig queries in the right column against the data present in the left column in the below image?
![Pig Example](images/pig_example.png)
Output:
```mysql
7,Komal,Nayak,24,9848022334,trivendram
8,Bharathi,Nambiayar,24,9848022333,Chennai
5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
6,Archana,Mishra,23,9848022335,Chennai
```
3. **Spark**
1. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a clusters memory and query it repeatedly, making it well suited to machine learning algorithms.
4. **Presto**
1. Presto is a high performance, distributed SQL query engine for Big Data.
2. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB.
3. Example presto query:
```mysql
use studentDB;
show tables;
SELECT roll_no, name FROM studentDB.studentDetails where section=A limit 5;
```
# Data Serialisation and storage
1. In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization..
2. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file.
3. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.

View File

@@ -1,3 +1,83 @@
# Evolution of Hadoop
![Evolution of hadoop](images/hadoop_evolution.png)
![Evolution of hadoop](images/hadoop_evolution.png)
# Architecture of Hadoop
1. **HDFS**
1. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.
2. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
3. HDFS is part of the [Apache Hadoop Core project](https://github.com/apache/hadoop).
![HDFS Architecture](images/hdfs_architecture.png)
1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories.
2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and write requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.
3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. </br></br></br>
2. **YARN**
1. YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing.
2. The main components of YARN architecture include:
![YARN Architecture](images/yarn_architecture.gif)
1. Client: It submits map-reduce(MR) jobs to the resource manager.
2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:
3. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources.
4. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails.
5. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Node Manager. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it on the request of the Application master.
6. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
7. Container: It is a collection of physical resources such as RAM, CPU cores and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies etc. </br></br>
# MapReduce framework
![MapReduce Framework](images/map_reduce.jpg)
1. The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key value pairs. Reduce job takes the output of the Map job i.e. the key value pairs and aggregates them to produce desired results.
2. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.
3. Please find the below Word count example demonstrating the usage of MapReduce framework:
![Word Count Example](images/mapreduce_example.jpg)
</br></br>
# Other tooling around hadoop
1. [**Hive**](https://hive.apache.org/)
1. Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce.
2. Ex. HQL query:
1. _SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name);_
3. In mysql:
1. _SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name;_
2. [**Pig**](https://pig.apache.org/)
1. Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce.
2. Here is a quick question for you:
What is the output of running the pig queries in the right column against the data present in the left column in the below image?
![Pig Example](images/pig_example.png)
Output:
```mysql
7,Komal,Nayak,24,9848022334,trivendram
8,Bharathi,Nambiayar,24,9848022333,Chennai
5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
6,Archana,Mishra,23,9848022335,Chennai
```
3. [**Spark**](https://spark.apache.org/)
1. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a clusters memory and query it repeatedly, making it well suited to machine learning algorithms.
4. [**Presto**](https://prestodb.io/)
1. Presto is a high performance, distributed SQL query engine for Big Data.
2. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB.
3. Example presto query:
```mysql
use studentDB;
show tables;
SELECT roll_no, name FROM studentDB.studentDetails where section=A limit 5;
```
</br>
# Data Serialisation and storage
1. In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization..
2. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file.
3. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.

View File

@@ -13,20 +13,46 @@ This course covers the basics of Big Data and how it has evolved to become what
Writing programs to draw analytics from data.
## Course Content
## Course Contents
### Table of Contents
1. [Overview of Big Data](https://linkedin.github.io/school-of-sre/big_data/overview/)
2. [Usage of Big Data techniques](https://linkedin.github.io/school-of-sre/big_data/overview/)
1. [Overview of Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/#overview-of-big-data)
2. [Usage of Big Data techniques](https://linkedin.github.io/school-of-sre/big_data/intro/#usage-of-big-data-techniques)
3. [Evolution of Hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/)
4. [Architecture of hadoop](https://linkedin.github.io/school-of-sre/big_data/architecture/)
4. [Architecture of hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/#architecture-of-hadoop)
1. HDFS
2. Yarn
5. [MapReduce framework](https://linkedin.github.io/school-of-sre/big_data/architecture/#mapreduce-framework)
6. [Other tooling around hadoop](https://linkedin.github.io/school-of-sre/big_data/architecture/#other-tooling-around-hadoop)
5. [MapReduce framework](https://linkedin.github.io/school-of-sre/big_data/evolution/#mapreduce-framework)
6. [Other tooling around hadoop](https://linkedin.github.io/school-of-sre/big_data/evolution/#other-tooling-around-hadoop)
1. Hive
2. Pig
3. Spark
4. Presto
7. [Data Serialisation and storage](https://linkedin.github.io/school-of-sre/big_data/architecture/#data-serialisation-and-storage)
7. [Data Serialisation and storage](https://linkedin.github.io/school-of-sre/big_data/evolution/#data-serialisation-and-storage)
# Overview of Big Data
1. Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.
2. Big Data could consist of
1. Structured data
2. Unstructured data
3. Semi-structured data
3. Characteristics of Big Data:
1. Volume
2. Variety
3. Velocity
4. Variability
4. Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.
# Usage of Big Data Techniques
1. Take the example of the traffic lights problem.
1. There are more than 300,000 traffic lights in the US as of 2018.
2. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system.
3. If each of the IOT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day.
4. How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day?
2. Consider the next example on Unified Payments Interface (UPI) transactions:
1. We had about 1.15 billion UPI transactions in the month of October, 2019 in India.
12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?

View File

@@ -1,13 +0,0 @@
# Overview of Big Data
1. Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.
2. Big Data could consist of
1. Structured data
2. Unstructured data
3. Semi-structured data
3. Characteristics of Big Data:
1. Volume
2. Variety
3. Velocity
4. Variability
4. Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.

View File

@@ -1,10 +0,0 @@
# Usage of Big Data techniques
1. Take the example of the traffic lights problem.
1. There are more than 300,000 traffic lights in the US as of 2018.
2. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system.
3. If each of the IOT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day.
4. How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day?
2. Consider the next example on Unified Payments Interface (UPI) transactions:
1. We had about 1.15 billion UPI transactions in the month of October, 2019 in India.
12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?

View File

@@ -0,0 +1,25 @@
# Further reading:
NoSQL:
[https://hostingdata.co.uk/nosql-database/](https://hostingdata.co.uk/nosql-database/)
[https://www.mongodb.com/nosql-explained](https://www.mongodb.com/nosql-explained)
[https://www.mongodb.com/nosql-explained/nosql-vs-sql](https://www.mongodb.com/nosql-explained/nosql-vs-sql)
Cap Theorem
[http://www.julianbrowne.com/article/brewers-cap-theorem](http://www.julianbrowne.com/article/brewers-cap-theorem)
Scalability
[http://www.slideshare.net/jboner/scalability-availability-stability-patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns)
Eventual Consistency
[https://www.allthingsdistributed.com/2008/12/eventually_consistent.html](https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
[https://www.toptal.com/big-data/consistent-hashing](https://www.toptal.com/big-data/consistent-hashing)
[https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf](https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf)

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

View File

@@ -0,0 +1,217 @@
# NoSQL Concepts
## What to expect from this course
At the end of training, you will have an understanding of what a NoSQL database is, what kind of advantages or disadvantages it has over traditional RDBMS, learn about different types of NoSQL databases and understand some of the underlying concepts & trade offs w.r.t to NoSQL.
## What is not covered under this course
We will not be deep diving into any specific NoSQL Database.
## Course Contents
* [Introduction to NoSQL](https://linkedin.github.io/school-of-sre/databases_nosql/intro/#introduction)
* [CAP Theorem](https://linkedin.github.io/school-of-sre/databases_nosql/key_concepts/#cap-theorem)
* [Data versioning](https://linkedin.github.io/school-of-sre/databases_nosql/key_concepts/#versioning-of-data-in-distributed-systems)
* [Partitioning](https://linkedin.github.io/school-of-sre/databases_nosql/key_concepts/#partitioning)
* [Hashing](https://linkedin.github.io/school-of-sre/databases_nosql/key_concepts/#hashing)
* [Quorum](https://linkedin.github.io/school-of-sre/databases_nosql/key_concepts/#quorum)
## Introduction
When people use the term “NoSQL database”, they typically use it to refer to any non-relational database. Some say the term “NoSQL” stands for “non SQL” while others say it stands for “not only SQL.” Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.
A common misconception is that NoSQL databases or non-relational databases dont store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be _easier_, because related data doesnt have to be split between tables.
Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century. NASA used a NoSQL database to track inventory for the Apollo mission. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity. With the rise of Agile development methodology, NoSQL databases were developed with a focus on scaling, fast performance and at the same time allowed for frequent application changes and made programming easier.
### Types of NoSQL databases:
Over time due to the way these NoSQL databases were developed to suit requirements at different companies, we ended up with quite a few types of them. However, they can be broadly classified into 4 types. Some of the databases can overlap between different types. They are
1. **Document databases: **They store data in documents similar to [JSON](https://www.json.org/json-en.html) (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase
2. **Key-Value databases:** These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you dont need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: [Redis](https://redis.io/), [DynamoDB](https://aws.amazon.com/dynamodb/), [Voldemort](https://www.project-voldemort.com/voldemort/)/[Venice](https://engineering.linkedin.com/blog/2017/04/building-venice--a-production-software-case-study) (Linkedin),
3. **Wide-Column stores:** They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. [Cassandra](https://cassandra.apache.org/) and [HBase](https://hbase.apache.org/) are two of the most popular wide-column stores.
4. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: [Neo4j](https://neo4j.com/)
### **Comparison**
<table>
<tr>
<td>
</td>
<td>Performance
</td>
<td>Scalability
</td>
<td>Flexibility
</td>
<td>Complexity
</td>
<td>Functionality
</td>
</tr>
<tr>
<td>Key Value
</td>
<td>high
</td>
<td>high
</td>
<td>high
</td>
<td>none
</td>
<td>Variable
</td>
</tr>
<tr>
<td>Document stores
</td>
<td>high
</td>
<td>Variable (high)
</td>
<td>high
</td>
<td>low
</td>
<td>Variable (low)
</td>
</tr>
<tr>
<td>Column DB
</td>
<td>high
</td>
<td>high
</td>
<td>moderate
</td>
<td>low
</td>
<td>minimal
</td>
</tr>
<tr>
<td>Graph
</td>
<td>Variable
</td>
<td>Variable
</td>
<td>high
</td>
<td>high
</td>
<td>Graph theory
</td>
</tr>
</table>
### Differences between SQL and NoSQL
The table below summarizes the main differences between SQL and NoSQL databases.
<table>
<tr>
<td>
</td>
<td>SQL Databases
</td>
<td>NoSQL Databases
</td>
</tr>
<tr>
<td>Data Storage Model
</td>
<td>Tables with fixed rows and columns
</td>
<td>Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges
</td>
</tr>
<tr>
<td>Primary Purpose
</td>
<td>General purpose
</td>
<td>Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data
</td>
</tr>
<tr>
<td>Schemas
</td>
<td>Rigid
</td>
<td>Flexible
</td>
</tr>
<tr>
<td>Scaling
</td>
<td>Vertical (scale-up with a larger server)
</td>
<td>Horizontal (scale-out across commodity servers)
</td>
</tr>
<tr>
<td>Multi-Record <a href="https://en.wikipedia.org/wiki/ACID">ACID </a>Transactions
</td>
<td>Supported
</td>
<td>Most do not support multi-record ACID transactions. However, some—like MongoDB—do.
</td>
</tr>
<tr>
<td>Joins
</td>
<td>Typically required
</td>
<td>Typically not required
</td>
</tr>
<tr>
<td>Data to Object Mapping
</td>
<td>Requires ORM (object-relational mapping)
</td>
<td>Many do not require ORMs. Document DB documents map directly to data structures in most popular programming languages.
</td>
</tr>
</table>
### Advantages
* Flexible Data Models
Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead.
* Horizontal Scaling
Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems.
* Fast Queries
NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses.
* Developer productivity
NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.

View File

@@ -0,0 +1,274 @@
# Key Concepts
Lets looks at some of the key concepts when we talk about NoSQL or distributed systems
### CAP Theorem
In a keynote titled “[Towards Robust Distributed Systems](https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/Brewer_podc_keynote_2000.pdf)” at ACMs PODC symposium in 2000 Eric Brewer came up with the so-called CAP-theorem which is widely adopted today by large web companies as well as in the NoSQL community. The CAP acronym stands for **C**onsistency, **A**vailability & **P**artition Tolerance.
* **Consistency**
It refers to how consistent a system is after an execution. A distributed system is called consistent when a write made by a source is available for all readers of that shared data. Different NoSQL systems support different levels of consistency.
* **Availability**
It refers to how a system responds to loss of functionality of different systems due to hardware and software failures. A high availability implies that a system is still available to handle operations (reads and writes) when a certain part of the system is down due to a failure or upgrade.
* **Partition Tolerance**
It is the ability of the system to continue operations in the event of a network partition. A network partition occurs when a failure causes two or more islands of networks where the systems cant talk to each other across the islands temporarily or permanently.
Brewer alleges that one can at most choose two of these three characteristics in a shared-data system. The CAP-theorem states that a choice can only be made for two options out of consistency, availability and partition tolerance. A growing number of use cases in large scale applications tend to value reliability implying that availability & redundancy are more valuable than consistency. As a result these systems struggle to meet ACID properties. They attain this by loosening on the consistency requirement i.e Eventual Consistency.
**Eventual Consistency **means that all readers will see writes, as time goes on: “In a steady state, the system will eventually return the last written value”. Clients therefore may face an inconsistent state of data as updates are in progress. For instance, in a replicated database updates may go to one node which replicates the latest version to all other nodes that contain a replica of the modified dataset so that the replica nodes eventually will have the latest version.
NoSQL systems support different levels of eventual consistency models. For example:
* Read Your Own Writes Consistency
A client will see his updates immediately after they are written. The reads can hit nodes other than the one where it was written. However he might not see updates by other clients immediately.
* Session Consistency:
A client will see the updates to his data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates.
* Casual Consistency
A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders
Eventual consistency is useful if concurrent updates of the same partitions of data are unlikely and if clients do not immediately depend on reading updates issued by themselves or by other clients.
Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas.
CAP alternatives illustration
<table>
<tr>
<td>Choice
</td>
<td>Traits
</td>
<td>Examples
</td>
</tr>
<tr>
<td>Consistency + Availability
<p>
(Forfeit Partitions)
</td>
<td>2-phase commits
<p>
Cache invalidation protocols
</td>
<td>Single-site databases Cluster databases
<p>
LDAP
<p>
xFS file system
</td>
</tr>
<tr>
<td>Consistency + Partition tolerance
<p>
(Forfeit Availability)
</td>
<td>Pessimistic locking
<p>
Make minority partitions unavailable
</td>
<td>Distributed databases Distributed locking Majority protocols
</td>
</tr>
<tr>
<td>Availability + Partition tolerance (Forfeit Consistency)
</td>
<td>expirations/leases
<p>
conflict resolution optimistic
</td>
<td>DNS
<p>
Web caching
</td>
</tr>
</table>
### Versioning of Data in distributed systems
When data is distributed across nodes, it can be modified on different nodes at the same time (assuming strict consistency is enforced). Questions arise on conflict resolution for concurrent updates. Some of the popular conflict resolution mechanism are
* **Timestamps**
This is the most obvious solution. You sort updates based on chronological order and choose the latest update. However this relies on clock synchronization across different parts of the infrastructure. This gets even more complicated when parts of systems are spread across different geographic locations.
* **Optimistic Locking**
You associate a unique value like a clock or counter with every data update. When a client wants to update data, it has to specify which version of data needs to be updated. This would mean you need to keep track of history of the data versions.
* **Vector Clocks**
A vector clock is defined as a tuple of clock values from each node. In a distributed environment, each node maintains a tuple of such clock values which represent the state of the nodes itself and its peers/replicas. A clock value may be real timestamps derived from local clock or version no.
<p id="gdcalert1" ><span style="color: red; font-weight: bold" images/vector_clocks.png> </span></p>
![alt_text](images/vector_clocks.png "Vector Clocks")
<p align="center"><span style="text-decoration:underline; font-weight:bold;">Vector clocks illustration</span></p>
Vector clocks have the following advantages over other conflict resolution mechanism
1. No dependency on synchronized clocks
2. No total ordering of revision nos required for casual reasoning
No need to store and maintain multiple versions of the data on different nodes.** **
### Partitioning
When the amount of data crosses the capacity of a single node, we need to think of splitting data, creating replicas for load balancing & disaster recovery. Depending on how dynamic the infrastructure is, we have a few approaches that we can take.
1. **Memory cached**
These are partitioned in-memory databases that are primarily used for transient data. These databases are generally used as a front for traditional RDBMS. Most frequently used data is replicated from a rdbms into a memory database to facilitate fast queries and to take the load off from backend DBs. A very common example is memcached or couchbase.
2. **Clustering**
Traditional cluster mechanisms abstract away the cluster topology from clients. A client need not know where the actual data is residing and which node it is talking to. Clustering is very commonly used in traditional RDBMS where it can help scaling the persistent layer to a certain extent.
3. **Separating reads from writes**
In this method, you will have multiple replicas hosting the same data. The incoming writes are typically sent to a single node (Leader) or multiple nodes (multi-Leader), while the rest of the replicas (Follower) handle reads requests. The leader replicates writes asynchronously to all followers. However the write lag cant be completely avoided. Sometimes a leader can crash before it replicates all the data to a follower. When this happens, a follower with the most consistent data can be turned into a leader. As you can realize now, it is hard to enforce full consistency in this model. You also need to consider the ratio of read vs write traffic. This model wont make sense when writes are higher than reads. The replication methods can also vary widely. Some systems do a complete transfer of state periodically, while others use a delta state transfer approach. You could also transfer the state by transferring the operations in order. The followers can then apply the same operations as the leader to catch up.
4. **Sharding**
Sharing refers to dividing data in such a way that data is distributed evenly (both in terms of storage & processing power) across a cluster of nodes. It can also imply data locality, which means similar & related data is stored together to facilitate faster access. A shard in turn can be further replicated to meet load balancing or disaster recovery requirements. A single shard replica might take in all writes (single leader) or multiple replicas can take writes (multi-leader). Reads can be distributed across multiple replicas. Since data is now distributed across multiple nodes, clients should be able to consistently figure out where data is hosted. We will look at some of the common techniques below. The downside of sharding is that joins between shards is not possible. So an upstream/downstream application has to aggregate the results from multiple shards.
<p id="gdcalert2" ><span style="color: red; font-weight: bold" images/database_sharding.png></span></p>
![alt_text]( images/database_sharding.png "Sharding")
<p align="center"><span style="text-decoration:underline; font-weight:bold;">Sharding example</span> </p>
### Hashing
A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as _hash code_, or simply _hash_. In a partitioned database, it is important to consistently map a key to a server/replica.
For ex: you can use a very simple hash as a modulo function.
_p = k mod n_
Where
p -> partition,
k -> primary key
n -> no of nodes
The downside of this simple hash is that, whenever the cluster topology changes, the data distribution also changes. When you are dealing with memory caches, it will be easy to distribute partitions around. Whenever a node joins/leaves a topology, partitions can reorder themselves, a cache miss can be re-populated from backend DB. However when you look at persistent data, it is not possible as the new node doesnt have the data needed to serve it. This brings us to consistent hashing.
#### Consistent Hashing
Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed _hash table_ by assigning them a position on an abstract circle, or _hash ring_. This allows servers and objects to scale without affecting the overall system.
Say that our hash function h() generates a 32-bit integer. Then, to determine to which server we will send a key k, we find the server s whose hash h(s) is the smallest integer that is larger than h(k). To make the process simpler, we assume the table is circular, which means that if we cannot find a server with a hash larger than h(k), we wrap around and start looking from the beginning of the array.
<p id="gdcalert3" ><span style="color: red; font-weight: bold" images/consistent_hashing.png> </span></p>
![alt_text]( images/consistent_hashing.png "Consistent Hashing")
<p align="center"><span style="text-decoration:underline; font-weight:bold;">Consistent hashing illustration</span></p>
In consistent hashing when a server is removed or added then only the keys from that server are relocated. For example, if server S3 is removed then, all keys from server S3 will be moved to server S4 but keys stored on server S4 and S2 are not relocated. But there is one problem, when server S3 is removed then keys from S3 are not equally distributed among remaining servers S4 and S2. They are only assigned to server S4 which increases the load on server S4.
To evenly distribute the load among servers when a server is added or removed, it creates a fixed number of replicas ( known as virtual nodes) of each server and distributes it along the circle. So instead of server labels S1, S2 and S3, we will have S10 S11…S19, S20 S21…S29 and S30 S31…S39. The factor for a number of replicas is also known as _weight_, depending on the situation.
All keys which are mapped to replicas Sij are stored on server Si. To find a key we do the same thing, find the position of the key on the circle and then move forward until you find a server replica. If the server replica is Sij then the key is stored in server Si.
Suppose server S3 is removed, then all S3 replicas with labels S30 S31 … S39 must be removed. Now the objects keys adjacent to S3X labels will be automatically re-assigned to S1X, S2X and S4X. All keys originally assigned to S1, S2 & S4 will not be moved.
Similar things happen if we add a server. Suppose we want to add a server S5 as a replacement of S3 then we need to add labels S50 S51 … S59. In the ideal case, one-fourth of keys from S1, S2 and S4 will be reassigned to S5.
When applied to persistent storages, further issues arise: if a node has left the scene, data stored on this node becomes unavailable, unless it has been replicated to other nodes before; in the opposite case of a new node joining the others, adjacent nodes are no longer responsible for some pieces of data which they still store but not get asked for anymore as the corresponding objects are no longer hashed to them by requesting clients. In order to address this issue, a replication factor (r) can be introduced.
Introducing replicas in a partitioning scheme—besides reliability benefits—also makes it possible to spread workload for read requests that can go to any physical node responsible for a requested piece of data. Scalability doesnt work if the clients have to decide between multiple versions of the dataset, because they need to read from a quorum of servers which in turn reduces the efficiency of load balancing.
### Quorum
Quorum is the minimum number of nodes in a cluster that must be online and be able to communicate with each other. If any additional node failure occurs beyond this threshold, the cluster will stop running.
To attain a quorum, you need a majority of the nodes. Commonly it is (N/2 + 1), where N is the total no of nodes in the system. For ex,
In a 3 node cluster, you need 2 nodes for a majority,
In a 5 node cluster, you need 3 nodes for a majority,
In a 6 node cluster, you need 4 nodes for a majority.
<p id="gdcalert4" ><span style="color: red; font-weight: bold" images/Quorum.png > </span></p>
![alt_text](images/Quorum.png "image_tooltip")
<p align="center"> <span style="text-decoration:underline; font-weight:bold;">Quorum example</span> </p>
Network problems can cause communication failures among cluster nodes. One set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This is known as split brain in cluster or cluster partitioning.
Now the partition which has quorum is allowed to continue running the application. The other partitions are removed from the cluster.
Eg: In a 5 node cluster, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run.
Below diagram demonstrates Quorum selection on a cluster partitioned into two sets.
<p id="gdcalert5" ><span style="color: red; font-weight: bold" images/cluster_quorum.png> </span></p>
![alt_text](images/cluster_quorum.png "image_tooltip")
**<p align="center"><span style="text-decoration:underline; font-weight:bold;">Cluster Quorum example</span></p>**

View File

@@ -83,7 +83,7 @@ Above tree structure should make things clear. Notice a clear branch/fork on com
## Merges
Now say the feature you were working on branch `b1` is complete. And you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you will pull the latest code from upstream (eg: GitHub). Then you need to merge your code from `b1` into master. And there could be two ways this can be done.
Now say the feature you were working on branch `b1` is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from `b1` into master. There could be two ways this can be done.
Here is the current history:
@@ -96,7 +96,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* df2fb7a adding file 1
```
**Option 1: Directly merge the branch.** Merging the branch b1 into master will result in a new merge commit which will merge changes from two different lines of history and create a new commit of the result.
**Option 1: Directly merge the branch.** Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result.
```bash
spatel1-mn1:school-of-sre spatel1$ git merge b1

10
courses/git/conclusion.md Normal file
View File

@@ -0,0 +1,10 @@
## What next from here?
There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like
- Cherrypick
- Squash
- Amend
- Stash
- Reset

View File

@@ -10,15 +10,13 @@
## What to expect from this course
As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!
As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!
## What is not covered under this course
Advanced usage and specifics of internal implementation details of Git.
## Course Content
### Table of Contents
## Course Contents
1. [Git Basics](https://linkedin.github.io/school-of-sre/git/git-basics/#git-basics)
2. [Working with Branches](https://linkedin.github.io/school-of-sre/git/branches/)
@@ -27,7 +25,7 @@ Advanced usage and specifics of internal implementation details of Git.
## Git Basics
Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains history of the changes happened with the codebase.
Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase.
### Creating a Git Repo
@@ -92,7 +90,7 @@ spatel1-mn1:school-of-sre spatel1$ git commit -m "adding file 1"
create mode 100644 file1.txt
```
Notice how after adding the file, git status says `Changes to be commited:`. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via `-m`.
Notice how after adding the file, git status says `Changes to be committed:`. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via `-m`.
### More About a Commit

View File

@@ -40,13 +40,3 @@ this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION
1 file changed, 1 insertion(+)
create mode 100644 sample.txt
```
## What next from here?
There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like
- Cherrypick
- Squash
- Amend
- Stash
- Reset

View File

@@ -1,5 +1,7 @@
# School of SRE
![School of SRE](img/sos.png)
<img src="img/sos.png" width=200 >
Early 2019, we started visiting campuses to recruit the brightest minds to ensure LinkedIn and all the services that it is composed of is always available for everyone. This function at Linkedin falls in the purview of the Site Reliability Engineering team and Site Reliability Engineers ( SRE ) who are Software Engineers who specialize in reliability. SREs apply the principles of computer science and engineering to the design and development of computer systems: generally, large distributed ones.
As we continued on this journey we started getting a lot of questions from these campuses on what exactly site engineering roll entails? and, how could someone learn the skills and the disciplines involved to become a successful site engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as Interns or as full time engineers to become a part of the Site Engineering team, we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can on board new new graduate engineers to the site engineering team.
@@ -15,7 +17,7 @@ In this course we are focusing on building strong foundational skills. The cours
- [Python and Web](https://linkedin.github.io/school-of-sre/python_web/intro/)
- Data
- Relational databases (MySQL)
- NoSQL concepts
- [NoSQL concepts](https://linkedin.github.io/school-of-sre/databases_nosql/intro/)
- [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/)
- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/)
- [Security](https://linkedin.github.io/school-of-sre/security/intro/)

View File

@@ -1,7 +1,12 @@
# Command Line Basics
## What is a command ?
## Lab Environment Setup
One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands.
[REPL](https://repl.it/languages/bash) is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course.
## What is a Command
A command is a program that tells the operating system to perform
specific work. Programs are stored as files in linux. Therefore, a
@@ -438,22 +443,3 @@ prints the unique numbers from the input.
I/O redirection -
[https://tldp.org/LDP/abs/html/io-redirection.html](https://tldp.org/LDP/abs/html/io-redirection.html)
## Applications in SRE Role
- As a SRE, you will be required to perform some general tasks on these linux servers. You will also be using the command line when you are troubleshooting issues.
- Moving from one location to another in the filesystem will require the help of ls, pwd and cd commands
- You may need to search some specific information in the log files. Grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command.
- Tail command is very useful to view the latest data in the log file.
## Useful courses and tutorials
- [Edx linuxcourse](https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/) -
This video course can be very helpful in developing the basics of linux command line. This course is provided
in both free and paidmodes by edX. If you take the free course, you will not be able to access the assignments.
- [https://linuxcommand.org/lc3_learning_the_shell.php](https://linuxcommand.org/lc3_learning_the_shell.php)

View File

@@ -0,0 +1,25 @@
# Conclusion
With this we have covered the basics of linux operating systems along with basic commands
which are used in linux. We have also covered the linux server administration commands.
We hope that this course will make it easier for you to operate on the command line.
## Applications in SRE Role
1. As a SRE, you will be required to perform some general tasks on these linux servers. You will also be using the command line when you are troubleshooting issues.
2. Moving from one location to another in the filesystem will require the help of ls, pwd and cd commands
3. You may need to search some specific information in the log files. Grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command.
4. Tail command is very useful to view the latest data in the log file.
5. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown, chmod and chgrp commands.
6. SSH is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server.
7. What if we want to run an apache server or nginx on a server ? We will first install it using the package manager. Package management commands become important here.
8. Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed.
9. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here.
10. If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started.
## Useful Courses and tutorials
* [Edx basic linux commands course](https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/)
* [Edx Red Hat Enterprise Linux Course](https://courses.edx.org/courses/course-v1:RedHat+RH066x+2T2017/course/)
* [https://linuxcommand.org/lc3_learning_the_shell.php](https://linuxcommand.org/lc3_learning_the_shell.php)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 134 KiB

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 251 KiB

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 192 KiB

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 161 KiB

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 301 KiB

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 67 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 178 KiB

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 306 KiB

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 375 KiB

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 48 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 332 KiB

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 101 KiB

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 82 KiB

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 119 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 120 KiB

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 373 KiB

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 103 KiB

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 178 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 295 KiB

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 192 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 48 KiB

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 89 KiB

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 242 KiB

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 245 KiB

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 295 KiB

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 171 KiB

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 168 KiB

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 53 KiB

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 243 KiB

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 186 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 217 KiB

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 179 KiB

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 71 KiB

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 268 KiB

After

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 95 KiB

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 141 KiB

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 288 KiB

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 302 KiB

After

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 628 KiB

After

Width:  |  Height:  |  Size: 188 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 233 KiB

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 160 KiB

After

Width:  |  Height:  |  Size: 115 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 164 KiB

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 99 KiB

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 64 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 41 KiB

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 131 KiB

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 189 KiB

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 189 KiB

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 187 KiB

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 246 KiB

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 87 KiB

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 103 KiB

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 353 KiB

After

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 216 KiB

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 147 KiB

After

Width:  |  Height:  |  Size: 72 KiB

Some files were not shown because too many files have changed in this diff Show More