Deployed 6d74e6c with MkDocs version: 1.1.2
6
404.html
@@ -529,7 +529,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="/big_data/overview/" class="md-nav__link">
|
||||
<a href="/big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -541,7 +541,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="/big_data/usage/" class="md-nav__link">
|
||||
<a href="/big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -565,7 +565,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="/big_data/architecture/" class="md-nav__link">
|
||||
<a href="/big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
|
||||
@@ -529,7 +529,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../big_data/overview/" class="md-nav__link">
|
||||
<a href="../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -541,7 +541,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../big_data/usage/" class="md-nav__link">
|
||||
<a href="../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -565,7 +565,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../big_data/architecture/" class="md-nav__link">
|
||||
<a href="../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
|
||||
@@ -538,7 +538,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../overview/" class="md-nav__link">
|
||||
<a href="../overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -550,7 +550,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../usage/" class="md-nav__link">
|
||||
<a href="../usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -583,7 +583,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../architecture/" class="md-nav__link">
|
||||
<a href="../architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -823,6 +823,96 @@
|
||||
|
||||
<h1 id="evolution-of-hadoop">Evolution of Hadoop</h1>
|
||||
<p><img alt="Evolution of hadoop" src="../images/hadoop_evolution.png" /></p>
|
||||
<h1 id="architecture-of-hadoop">Architecture of Hadoop</h1>
|
||||
<ol>
|
||||
<li>
|
||||
<p><strong>HDFS</strong></p>
|
||||
<ol>
|
||||
<li>The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. </li>
|
||||
<li>HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. </li>
|
||||
<li>HDFS is part of the <a href="https://github.com/apache/hadoop">Apache Hadoop Core project</a>.</li>
|
||||
</ol>
|
||||
<p><img alt="HDFS Architecture" src="../images/hdfs_architecture.png" /></p>
|
||||
<ol>
|
||||
<li>NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories.</li>
|
||||
<li>DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and write requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.</li>
|
||||
<li>Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. </br></br></br></li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>YARN</strong></p>
|
||||
<ol>
|
||||
<li>YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing.</li>
|
||||
<li>The main components of YARN architecture include:</li>
|
||||
</ol>
|
||||
<p><img alt="YARN Architecture" src="../images/yarn_architecture.gif" /></p>
|
||||
<ol>
|
||||
<li>Client: It submits map-reduce(MR) jobs to the resource manager.</li>
|
||||
<li>Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:</li>
|
||||
<li>Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources.</li>
|
||||
<li>Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails.</li>
|
||||
<li>Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Node Manager. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it on the request of the Application master.</li>
|
||||
<li>Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.</li>
|
||||
<li>Container: It is a collection of physical resources such as RAM, CPU cores and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies etc. </br></br></li>
|
||||
</ol>
|
||||
</li>
|
||||
</ol>
|
||||
<h1 id="mapreduce-framework">MapReduce framework</h1>
|
||||
<p><img alt="MapReduce Framework" src="../images/map_reduce.jpg" /></p>
|
||||
<ol>
|
||||
<li>The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key value pairs. Reduce job takes the output of the Map job i.e. the key value pairs and aggregates them to produce desired results. </li>
|
||||
<li>Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.</li>
|
||||
<li>Please find the below Word count example demonstrating the usage of MapReduce framework:</li>
|
||||
</ol>
|
||||
<p><img alt="Word Count Example" src="../images/mapreduce_example.jpg" />
|
||||
</br></br></p>
|
||||
<h1 id="other-tooling-around-hadoop">Other tooling around hadoop</h1>
|
||||
<ol>
|
||||
<li><a href="https://hive.apache.org/"><strong>Hive</strong></a><ol>
|
||||
<li>Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce.</li>
|
||||
<li>Ex. HQL query: <ol>
|
||||
<li><em>SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name);</em></li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>In mysql: <ol>
|
||||
<li><em>SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name;</em></li>
|
||||
</ol>
|
||||
</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>
|
||||
<p><a href="https://pig.apache.org/"><strong>Pig</strong></a></p>
|
||||
<ol>
|
||||
<li>Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce.</li>
|
||||
<li>Here is a quick question for you:
|
||||
What is the output of running the pig queries in the right column against the data present in the left column in the below image?</li>
|
||||
</ol>
|
||||
<p><img alt="Pig Example" src="../images/pig_example.png" /></p>
|
||||
<p>Output:
|
||||
<code>mysql
|
||||
7,Komal,Nayak,24,9848022334,trivendram
|
||||
8,Bharathi,Nambiayar,24,9848022333,Chennai
|
||||
5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
|
||||
6,Archana,Mishra,23,9848022335,Chennai</code>
|
||||
3. <a href="https://spark.apache.org/"><strong>Spark</strong></a>
|
||||
1. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.
|
||||
4. <a href="https://prestodb.io/"><strong>Presto</strong></a>
|
||||
1. Presto is a high performance, distributed SQL query engine for Big Data.
|
||||
2. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB.
|
||||
3. Example presto query:
|
||||
<code>mysql
|
||||
use studentDB;
|
||||
show tables;
|
||||
SELECT roll_no, name FROM studentDB.studentDetails where section=’A’ limit 5;</code>
|
||||
</br></p>
|
||||
</li>
|
||||
</ol>
|
||||
<h1 id="data-serialisation-and-storage">Data Serialisation and storage</h1>
|
||||
<ol>
|
||||
<li>In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization..</li>
|
||||
<li>Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file.</li>
|
||||
<li>Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
|
||||
@@ -841,7 +931,7 @@
|
||||
<div class="md-footer-nav">
|
||||
<nav class="md-footer-nav__inner md-grid" aria-label="Footer">
|
||||
|
||||
<a href="../usage/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<a href="../intro/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12z"/></svg>
|
||||
</div>
|
||||
@@ -850,19 +940,19 @@
|
||||
<span class="md-footer-nav__direction">
|
||||
Previous
|
||||
</span>
|
||||
Usage of Big Data techniques
|
||||
Introduction
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
|
||||
<a href="../architecture/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<a href="../tasks/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Next
|
||||
</span>
|
||||
Architecture of Hadoop
|
||||
Tasks and conclusion
|
||||
</div>
|
||||
</div>
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
|
||||
@@ -610,7 +610,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../overview/" class="md-nav__link">
|
||||
<a href="../overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -622,7 +622,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../usage/" class="md-nav__link">
|
||||
<a href="../usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -646,7 +646,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../architecture/" class="md-nav__link">
|
||||
<a href="../architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -946,23 +946,56 @@
|
||||
<h2 id="course-content">Course Content</h2>
|
||||
<h3 id="table-of-contents">Table of Contents</h3>
|
||||
<ol>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/overview/">Overview of Big Data</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/overview/">Usage of Big Data techniques</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/intro/#overview-of-big-data">Overview of Big Data</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/intro/#usage-of-big-data-techniques">Usage of Big Data techniques</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/evolution/">Evolution of Hadoop</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/architecture/">Architecture of hadoop</a><ol>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/evolution/#architecture-of-hadoop">Architecture of hadoop</a><ol>
|
||||
<li>HDFS</li>
|
||||
<li>Yarn</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/architecture/#mapreduce-framework">MapReduce framework</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/architecture/#other-tooling-around-hadoop">Other tooling around hadoop</a><ol>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/evolution/#mapreduce-framework">MapReduce framework</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/evolution/#other-tooling-around-hadoop">Other tooling around hadoop</a><ol>
|
||||
<li>Hive</li>
|
||||
<li>Pig</li>
|
||||
<li>Spark</li>
|
||||
<li>Presto</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/architecture/#data-serialisation-and-storage">Data Serialisation and storage</a></li>
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/big_data/evolution/#data-serialisation-and-storage">Data Serialisation and storage</a></li>
|
||||
</ol>
|
||||
<h1 id="overview-of-big-data">Overview of Big Data</h1>
|
||||
<ol>
|
||||
<li>Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.</li>
|
||||
<li>Big Data could consist of<ol>
|
||||
<li>Structured data</li>
|
||||
<li>Unstructured data</li>
|
||||
<li>Semi-structured data</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Characteristics of Big Data:<ol>
|
||||
<li>Volume</li>
|
||||
<li>Variety</li>
|
||||
<li>Velocity</li>
|
||||
<li>Variability</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.</li>
|
||||
</ol>
|
||||
<h1 id="usage-of-big-data-techniques">Usage of Big Data Techniques</h1>
|
||||
<ol>
|
||||
<li>Take the example of the traffic lights problem.<ol>
|
||||
<li>There are more than 300,000 traffic lights in the US as of 2018.</li>
|
||||
<li>Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system.</li>
|
||||
<li>If each of the IOT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day.</li>
|
||||
<li>How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day?</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Consider the next example on Unified Payments Interface (UPI) transactions:<ol>
|
||||
<li>We had about 1.15 billion UPI transactions in the month of October, 2019 in India.</li>
|
||||
<li>If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?</li>
|
||||
</ol>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
|
||||
@@ -997,13 +1030,13 @@
|
||||
</a>
|
||||
|
||||
|
||||
<a href="../overview/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<a href="../evolution/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Next
|
||||
</span>
|
||||
Overview of Big Data
|
||||
Evolution of Hadoop
|
||||
</div>
|
||||
</div>
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
|
||||
@@ -538,7 +538,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../overview/" class="md-nav__link">
|
||||
<a href="../overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -550,7 +550,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../usage/" class="md-nav__link">
|
||||
<a href="../usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -574,7 +574,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../architecture/" class="md-nav__link">
|
||||
<a href="../architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -916,7 +916,7 @@
|
||||
<div class="md-footer-nav">
|
||||
<nav class="md-footer-nav__inner md-grid" aria-label="Footer">
|
||||
|
||||
<a href="../architecture/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<a href="../evolution/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12z"/></svg>
|
||||
</div>
|
||||
@@ -925,7 +925,7 @@
|
||||
<span class="md-footer-nav__direction">
|
||||
Previous
|
||||
</span>
|
||||
Architecture of Hadoop
|
||||
Evolution of Hadoop
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
|
||||
|
||||
|
||||
<title>Overview of Big Data - SchoolOfSRE</title>
|
||||
<title>Further reading: - SchoolOfSRE</title>
|
||||
|
||||
|
||||
|
||||
@@ -57,7 +57,7 @@
|
||||
<div data-md-component="skip">
|
||||
|
||||
|
||||
<a href="#overview-of-big-data" class="md-skip">
|
||||
<a href="#further-reading" class="md-skip">
|
||||
Skip to content
|
||||
</a>
|
||||
|
||||
@@ -86,7 +86,7 @@
|
||||
</span>
|
||||
<span class="md-header-nav__topic md-ellipsis">
|
||||
|
||||
Overview of Big Data
|
||||
Further reading:
|
||||
|
||||
</span>
|
||||
</div>
|
||||
@@ -480,12 +480,10 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active md-nav__item--nested">
|
||||
<li class="md-nav__item md-nav__item--nested">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4" type="checkbox" id="nav-4" checked>
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4" type="checkbox" id="nav-4" >
|
||||
<label class="md-nav__link" for="nav-4">
|
||||
Data
|
||||
<span class="md-nav__icon md-icon"></span>
|
||||
@@ -502,12 +500,10 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active md-nav__item--nested">
|
||||
<li class="md-nav__item md-nav__item--nested">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4-1" type="checkbox" id="nav-4-1" checked>
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4-1" type="checkbox" id="nav-4-1" >
|
||||
<label class="md-nav__link" for="nav-4-1">
|
||||
Big Data
|
||||
<span class="md-nav__icon md-icon"></span>
|
||||
@@ -526,7 +522,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../intro/" class="md-nav__link">
|
||||
<a href="../../big_data/intro/" class="md-nav__link">
|
||||
Introduction
|
||||
</a>
|
||||
</li>
|
||||
@@ -536,20 +532,11 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="toc" type="checkbox" id="__toc">
|
||||
|
||||
|
||||
|
||||
|
||||
<a href="./" class="md-nav__link md-nav__link--active">
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
|
||||
@@ -559,7 +546,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -571,7 +558,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../evolution/" class="md-nav__link">
|
||||
<a href="../../big_data/evolution/" class="md-nav__link">
|
||||
Evolution of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -583,7 +570,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../architecture/" class="md-nav__link">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -595,7 +582,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../tasks/" class="md-nav__link">
|
||||
<a href="../../big_data/tasks/" class="md-nav__link">
|
||||
Tasks and conclusion
|
||||
</a>
|
||||
</li>
|
||||
@@ -821,24 +808,19 @@
|
||||
|
||||
|
||||
|
||||
<h1 id="overview-of-big-data">Overview of Big Data</h1>
|
||||
<ol>
|
||||
<li>Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.</li>
|
||||
<li>Big Data could consist of<ol>
|
||||
<li>Structured data</li>
|
||||
<li>Unstructured data</li>
|
||||
<li>Semi-structured data</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Characteristics of Big Data:<ol>
|
||||
<li>Volume</li>
|
||||
<li>Variety</li>
|
||||
<li>Velocity</li>
|
||||
<li>Variability</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc.</li>
|
||||
</ol>
|
||||
<h1 id="further-reading">Further reading:</h1>
|
||||
<p>NoSQL:</p>
|
||||
<p>https://hostingdata.co.uk/nosql-database/</p>
|
||||
<p>https://www.mongodb.com/nosql-explained</p>
|
||||
<p>https://www.mongodb.com/nosql-explained/nosql-vs-sql</p>
|
||||
<p>Cap Theorem</p>
|
||||
<p>http://www.julianbrowne.com/article/brewers-cap-theorem</p>
|
||||
<p>Scalability</p>
|
||||
<p>http://www.slideshare.net/jboner/scalability-availability-stability-patterns</p>
|
||||
<p>Eventual Consistency</p>
|
||||
<p>https://www.allthingsdistributed.com/2008/12/eventually_consistent.html</p>
|
||||
<p>https://www.toptal.com/big-data/consistent-hashing</p>
|
||||
<p>https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf</p>
|
||||
|
||||
|
||||
|
||||
@@ -854,41 +836,6 @@
|
||||
|
||||
<footer class="md-footer">
|
||||
|
||||
<div class="md-footer-nav">
|
||||
<nav class="md-footer-nav__inner md-grid" aria-label="Footer">
|
||||
|
||||
<a href="../intro/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12z"/></svg>
|
||||
</div>
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Previous
|
||||
</span>
|
||||
Introduction
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
|
||||
<a href="../usage/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Next
|
||||
</span>
|
||||
Usage of Big Data techniques
|
||||
</div>
|
||||
</div>
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M4 11v2h12l-5.5 5.5 1.42 1.42L19.84 12l-7.92-7.92L10.5 5.5 16 11H4z"/></svg>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
</nav>
|
||||
</div>
|
||||
|
||||
<div class="md-footer-meta md-typeset">
|
||||
<div class="md-footer-meta__inner md-grid">
|
||||
<div class="md-footer-copyright">
|
||||
BIN
databases_nosql/images/Quorum.png
Normal file
|
After Width: | Height: | Size: 30 KiB |
BIN
databases_nosql/images/cluster_quorum.png
Normal file
|
After Width: | Height: | Size: 58 KiB |
BIN
databases_nosql/images/consistent_hashing.png
Normal file
|
After Width: | Height: | Size: 73 KiB |
BIN
databases_nosql/images/database_sharding.png
Normal file
|
After Width: | Height: | Size: 20 KiB |
BIN
databases_nosql/images/vector_clocks.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
1124
databases_nosql/intro/index.html
Normal file
1116
databases_nosql/key_concepts/index.html
Normal file
@@ -576,7 +576,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/overview/" class="md-nav__link">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -588,7 +588,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -612,7 +612,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/architecture/" class="md-nav__link">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -928,7 +928,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
|
||||
<p>Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently.</p>
|
||||
<p><strong>To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo.</strong></p>
|
||||
<h2 id="merges">Merges</h2>
|
||||
<p>Now say the feature you were working on branch <code>b1</code> is complete. And you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you will pull the latest code from upstream (eg: GitHub). Then you need to merge your code from <code>b1</code> into master. And there could be two ways this can be done.</p>
|
||||
<p>Now say the feature you were working on branch <code>b1</code> is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from <code>b1</code> into master. There could be two ways this can be done.</p>
|
||||
<p>Here is the current history:</p>
|
||||
<pre><code class="language-bash">spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
|
||||
* 60dc441 (HEAD -> master) adding master.txt file
|
||||
@@ -937,7 +937,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
|
||||
* 7f3b00e adding file 2
|
||||
* df2fb7a adding file 1
|
||||
</code></pre>
|
||||
<p><strong>Option 1: Directly merge the branch.</strong> Merging the branch b1 into master will result in a new merge commit which will merge changes from two different lines of history and create a new commit of the result.</p>
|
||||
<p><strong>Option 1: Directly merge the branch.</strong> Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result.</p>
|
||||
<pre><code class="language-bash">spatel1-mn1:school-of-sre spatel1$ git merge b1
|
||||
Merge made by the 'recursive' strategy.
|
||||
b1.txt | 1 +
|
||||
|
||||
@@ -686,7 +686,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/overview/" class="md-nav__link">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -698,7 +698,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -722,7 +722,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/architecture/" class="md-nav__link">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -1097,7 +1097,7 @@
|
||||
</li>
|
||||
</ol>
|
||||
<h2 id="what-to-expect-from-this-course">What to expect from this course</h2>
|
||||
<p>As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!</p>
|
||||
<p>As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!</p>
|
||||
<h2 id="what-is-not-covered-under-this-course">What is not covered under this course</h2>
|
||||
<p>Advanced usage and specifics of internal implementation details of Git.</p>
|
||||
<h2 id="course-content">Course Content</h2>
|
||||
@@ -1109,7 +1109,7 @@
|
||||
<li><a href="https://linkedin.github.io/school-of-sre/git/github-hooks/#hooks">Hooks</a></li>
|
||||
</ol>
|
||||
<h2 id="git-basics">Git Basics</h2>
|
||||
<p>Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains history of the changes happened with the codebase.</p>
|
||||
<p>Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase.</p>
|
||||
<h3 id="creating-a-git-repo">Creating a Git Repo</h3>
|
||||
<p>Any folder can be converted into a git repository. After executing the following command, we will see a <code>.git</code> folder within the folder, which makes our folder a git repository. <strong>All the magic that git does, <code>.git</code> folder is the enabler for the same.</strong></p>
|
||||
<pre><code class="language-bash"># creating an empty folder and changing current dir to it
|
||||
@@ -1158,7 +1158,7 @@ spatel1-mn1:school-of-sre spatel1$ git commit -m "adding file 1"
|
||||
1 file changed, 1 insertion(+)
|
||||
create mode 100644 file1.txt
|
||||
</code></pre>
|
||||
<p>Notice how after adding the file, git status says <code>Changes to be commited:</code>. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via <code>-m</code>.</p>
|
||||
<p>Notice how after adding the file, git status says <code>Changes to be committed:</code>. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via <code>-m</code>.</p>
|
||||
<h3 id="more-about-a-commit">More About a Commit</h3>
|
||||
<p>Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. (<code>df2fb7a</code> for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the <code>.git</code> folder. This is where all this snapshot or versions are stored. <em>In an efficient manner.</em></p>
|
||||
<h3 id="adding-more-changes">Adding More Changes</h3>
|
||||
|
||||
@@ -583,7 +583,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/overview/" class="md-nav__link">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -595,7 +595,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -619,7 +619,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/architecture/" class="md-nav__link">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
|
||||
@@ -543,7 +543,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="big_data/overview/" class="md-nav__link">
|
||||
<a href="big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -555,7 +555,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="big_data/usage/" class="md-nav__link">
|
||||
<a href="big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -579,7 +579,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="big_data/architecture/" class="md-nav__link">
|
||||
<a href="big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
|
||||
@@ -227,9 +227,16 @@
|
||||
</label>
|
||||
<ul class="md-nav__list" data-md-scrollfix>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#lab-environment-setup" class="md-nav__link">
|
||||
Lab Environment Setup
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#what-is-a-command" class="md-nav__link">
|
||||
What is a command ?
|
||||
What is a Command
|
||||
</a>
|
||||
|
||||
</li>
|
||||
@@ -396,20 +403,6 @@
|
||||
I/O Redirection
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#applications-in-sre-role" class="md-nav__link">
|
||||
Applications in SRE Role
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#useful-courses-and-tutorials" class="md-nav__link">
|
||||
Useful courses and tutorials
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
@@ -754,7 +747,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/overview/" class="md-nav__link">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -766,7 +759,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -790,7 +783,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/architecture/" class="md-nav__link">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -1023,9 +1016,16 @@
|
||||
</label>
|
||||
<ul class="md-nav__list" data-md-scrollfix>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#lab-environment-setup" class="md-nav__link">
|
||||
Lab Environment Setup
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#what-is-a-command" class="md-nav__link">
|
||||
What is a command ?
|
||||
What is a Command
|
||||
</a>
|
||||
|
||||
</li>
|
||||
@@ -1192,20 +1192,6 @@
|
||||
I/O Redirection
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#applications-in-sre-role" class="md-nav__link">
|
||||
Applications in SRE Role
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#useful-courses-and-tutorials" class="md-nav__link">
|
||||
Useful courses and tutorials
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
@@ -1222,7 +1208,10 @@
|
||||
|
||||
|
||||
<h1 id="command-line-basics">Command Line Basics</h1>
|
||||
<h2 id="what-is-a-command">What is a command ?</h2>
|
||||
<h2 id="lab-environment-setup">Lab Environment Setup</h2>
|
||||
<p>One can use an online bash interpreter to run all the commands that are provided as examples in this course. This will also help you in getting a hands-on experience of various linux commands.</p>
|
||||
<p><a href="https://repl.it/languages/bash">REPL</a> is one of the popular online bash interpreters for running linux commands. We will be using it for running all the commands mentioned in this course.</p>
|
||||
<h2 id="what-is-a-command">What is a Command</h2>
|
||||
<p>A command is a program that tells the operating system to perform
|
||||
specific work. Programs are stored as files in linux. Therefore, a
|
||||
command is also a file which is stored somewhere on the disk.</p>
|
||||
@@ -1545,32 +1534,6 @@ prints the unique numbers from the input.</p>
|
||||
<p><img alt="" src="../images/linux/commands/image28.png" /></p>
|
||||
<p>I/O redirection -
|
||||
<a href="https://tldp.org/LDP/abs/html/io-redirection.html">https://tldp.org/LDP/abs/html/io-redirection.html</a></p>
|
||||
<h2 id="applications-in-sre-role">Applications in SRE Role</h2>
|
||||
<ul>
|
||||
<li>
|
||||
<p>As a SRE, you will be required to perform some general tasks on these linux servers. You will also be using the command line when you are troubleshooting issues.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Moving from one location to another in the filesystem will require the help of ls, pwd and cd commands</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>You may need to search some specific information in the log files. Grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tail command is very useful to view the latest data in the log file.</p>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="useful-courses-and-tutorials">Useful courses and tutorials</h2>
|
||||
<ul>
|
||||
<li>
|
||||
<p><a href="https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/">Edx linuxcourse</a> -
|
||||
This video course can be very helpful in developing the basics of linux command line. This course is provided
|
||||
in both free and paidmodes by edX. If you take the free course, you will not be able to access the assignments.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><a href="https://linuxcommand.org/lc3_learning_the_shell.php">https://linuxcommand.org/lc3_learning_the_shell.php</a></p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
|
||||
|
||||
|
||||
<title>Architecture of Hadoop - SchoolOfSRE</title>
|
||||
<title>Conclusion - SchoolOfSRE</title>
|
||||
|
||||
|
||||
|
||||
@@ -57,7 +57,7 @@
|
||||
<div data-md-component="skip">
|
||||
|
||||
|
||||
<a href="#architecture-of-hadoop" class="md-skip">
|
||||
<a href="#conclusion" class="md-skip">
|
||||
Skip to content
|
||||
</a>
|
||||
|
||||
@@ -86,7 +86,7 @@
|
||||
</span>
|
||||
<span class="md-header-nav__topic md-ellipsis">
|
||||
|
||||
Architecture of Hadoop
|
||||
Conclusion
|
||||
|
||||
</span>
|
||||
</div>
|
||||
@@ -181,7 +181,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../linux_basics/intro/" class="md-nav__link">
|
||||
<a href="../intro/" class="md-nav__link">
|
||||
Introduction
|
||||
</a>
|
||||
</li>
|
||||
@@ -193,7 +193,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../linux_basics/command_line_basics/" class="md-nav__link">
|
||||
<a href="../command_line_basics/" class="md-nav__link">
|
||||
Command Line Basics
|
||||
</a>
|
||||
</li>
|
||||
@@ -205,7 +205,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../../linux_basics/linux_server_administration/" class="md-nav__link">
|
||||
<a href="../linux_server_administration/" class="md-nav__link">
|
||||
Server Administration
|
||||
</a>
|
||||
</li>
|
||||
@@ -480,12 +480,10 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active md-nav__item--nested">
|
||||
<li class="md-nav__item md-nav__item--nested">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4" type="checkbox" id="nav-4" checked>
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4" type="checkbox" id="nav-4" >
|
||||
<label class="md-nav__link" for="nav-4">
|
||||
Data
|
||||
<span class="md-nav__icon md-icon"></span>
|
||||
@@ -502,12 +500,10 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active md-nav__item--nested">
|
||||
<li class="md-nav__item md-nav__item--nested">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4-1" type="checkbox" id="nav-4-1" checked>
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="nav-4-1" type="checkbox" id="nav-4-1" >
|
||||
<label class="md-nav__link" for="nav-4-1">
|
||||
Big Data
|
||||
<span class="md-nav__icon md-icon"></span>
|
||||
@@ -526,7 +522,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../intro/" class="md-nav__link">
|
||||
<a href="../../big_data/intro/" class="md-nav__link">
|
||||
Introduction
|
||||
</a>
|
||||
</li>
|
||||
@@ -538,7 +534,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../overview/" class="md-nav__link">
|
||||
<a href="../../big_data/overview.md" class="md-nav__link">
|
||||
Overview of Big Data
|
||||
</a>
|
||||
</li>
|
||||
@@ -550,7 +546,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../usage/" class="md-nav__link">
|
||||
<a href="../../big_data/usage.md" class="md-nav__link">
|
||||
Usage of Big Data techniques
|
||||
</a>
|
||||
</li>
|
||||
@@ -562,7 +558,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../evolution/" class="md-nav__link">
|
||||
<a href="../../big_data/evolution/" class="md-nav__link">
|
||||
Evolution of Hadoop
|
||||
</a>
|
||||
</li>
|
||||
@@ -572,20 +568,11 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<li class="md-nav__item md-nav__item--active">
|
||||
|
||||
<input class="md-nav__toggle md-toggle" data-md-toggle="toc" type="checkbox" id="__toc">
|
||||
|
||||
|
||||
|
||||
|
||||
<a href="./" class="md-nav__link md-nav__link--active">
|
||||
<li class="md-nav__item">
|
||||
<a href="../../big_data/architecture.md" class="md-nav__link">
|
||||
Architecture of Hadoop
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
|
||||
@@ -595,7 +582,7 @@
|
||||
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="../tasks/" class="md-nav__link">
|
||||
<a href="../../big_data/tasks/" class="md-nav__link">
|
||||
Tasks and conclusion
|
||||
</a>
|
||||
</li>
|
||||
@@ -810,6 +797,28 @@
|
||||
|
||||
|
||||
|
||||
<label class="md-nav__title" for="__toc">
|
||||
<span class="md-nav__icon md-icon"></span>
|
||||
Table of contents
|
||||
</label>
|
||||
<ul class="md-nav__list" data-md-scrollfix>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#applications-in-sre-role" class="md-nav__link">
|
||||
Applications in SRE Role
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#useful-courses-and-tutorials" class="md-nav__link">
|
||||
Useful Courses and tutorials
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
||||
</nav>
|
||||
</div>
|
||||
</div>
|
||||
@@ -821,91 +830,29 @@
|
||||
|
||||
|
||||
|
||||
<h1 id="architecture-of-hadoop">Architecture of Hadoop</h1>
|
||||
<h1 id="conclusion">Conclusion</h1>
|
||||
<p>With this we have covered the basics of linux operating systems along with basic commands
|
||||
which are used in linux. We have also covered the linux server administration commands.</p>
|
||||
<p>We hope that this course will make it easier for you to operate on the command line.</p>
|
||||
<h2 id="applications-in-sre-role">Applications in SRE Role</h2>
|
||||
<ol>
|
||||
<li>
|
||||
<p><strong>HDFS</strong></p>
|
||||
<ol>
|
||||
<li>The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. </li>
|
||||
<li>HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. </li>
|
||||
<li>HDFS is part of the Apache Hadoop Core project.</li>
|
||||
</ol>
|
||||
<p><img alt="HDFS Architecture" src="../images/hdfs_architecture.png" /></p>
|
||||
<pre><code>1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories.
|
||||
2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and write requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.
|
||||
3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes.
|
||||
</code></pre>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>YARN</strong></p>
|
||||
<ol>
|
||||
<li>YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing.</li>
|
||||
<li>The main components of YARN architecture include:</li>
|
||||
</ol>
|
||||
<p><img alt="YARN Architecture" src="../images/yarn_architecture.gif" /></p>
|
||||
<pre><code>1. Client: It submits map-reduce jobs to the resource manager.
|
||||
2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:
|
||||
3. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources.
|
||||
4. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails.
|
||||
5. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Node Manager. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it on the request of the Application master.
|
||||
6. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
|
||||
7. Container: It is a collection of physical resources such as RAM, CPU cores and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies etc.
|
||||
</code></pre>
|
||||
</li>
|
||||
</ol>
|
||||
<h1 id="mapreduce-framework">MapReduce framework</h1>
|
||||
<p><img alt="MapReduce Framework" src="../images/map_reduce.jpg" /></p>
|
||||
<pre><code>1. The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key value pairs. Reduce job takes the output of the Map job i.e. the key value pairs and aggregates them to produce desired results.
|
||||
2. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.
|
||||
3. Please find the below Word count example demonstrating the usage of MapReduce framework:
|
||||
</code></pre>
|
||||
<p><img alt="Word Count Example" src="../images/mapreduce_example.jpg" /></p>
|
||||
<h1 id="other-tooling-around-hadoop">Other tooling around hadoop</h1>
|
||||
<ol>
|
||||
<li><strong>Hive</strong><ol>
|
||||
<li>Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce.</li>
|
||||
<li>Ex. HQL query: <ol>
|
||||
<li><em>SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name);</em></li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>In mysql: <ol>
|
||||
<li><em>SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name;</em></li>
|
||||
</ol>
|
||||
</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Pig</strong></p>
|
||||
<ol>
|
||||
<li>Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce.</li>
|
||||
<li>Here is a quick question for you:
|
||||
What is the output of running the pig queries in the right column against the data present in the left column in the below image?</li>
|
||||
</ol>
|
||||
<p><img alt="Pig Example" src="../images/pig_example.png" /></p>
|
||||
<p>Output:
|
||||
<code>mysql
|
||||
7,Komal,Nayak,24,9848022334,trivendram
|
||||
8,Bharathi,Nambiayar,24,9848022333,Chennai
|
||||
5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
|
||||
6,Archana,Mishra,23,9848022335,Chennai</code>
|
||||
3. <strong>Spark</strong>
|
||||
1. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.
|
||||
4. <strong>Presto</strong>
|
||||
1. Presto is a high performance, distributed SQL query engine for Big Data.
|
||||
2. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB.
|
||||
3. Example presto query:
|
||||
<code>mysql
|
||||
use studentDB;
|
||||
show tables;
|
||||
SELECT roll_no, name FROM studentDB.studentDetails where section=’A’ limit 5;</code></p>
|
||||
</li>
|
||||
</ol>
|
||||
<h1 id="data-serialisation-and-storage">Data Serialisation and storage</h1>
|
||||
<ol>
|
||||
<li>In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization..</li>
|
||||
<li>Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file.</li>
|
||||
<li>Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.</li>
|
||||
<li>As a SRE, you will be required to perform some general tasks on these linux servers. You will also be using the command line when you are troubleshooting issues.</li>
|
||||
<li>Moving from one location to another in the filesystem will require the help of ls, pwd and cd commands</li>
|
||||
<li>You may need to search some specific information in the log files. Grep command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command.</li>
|
||||
<li>Tail command is very useful to view the latest data in the log file.</li>
|
||||
<li>Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with chown, chmod and chgrp commands.</li>
|
||||
<li>SSH is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server.</li>
|
||||
<li>What if we want to run an apache server or nginx on a server ? We will first install it using the package manager. Package management commands become important here.</li>
|
||||
<li>Managing services on servers is another critical responsibility of a SRE. Systemd related commands can help in troubleshooting issues. If a service goes down, we can start it using systemctl start command. We can also stop a service in case it is not needed.</li>
|
||||
<li>Monitoring is another core responsibility of a SRE. Memory and CPU are two important system level metrics which should be monitored. Commands like top and free are quite helpful here.</li>
|
||||
<li>If a service is throwing an error, how do we find out the root cause of the error ? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started.</li>
|
||||
</ol>
|
||||
<h2 id="useful-courses-and-tutorials">Useful Courses and tutorials</h2>
|
||||
<ul>
|
||||
<li><a href="https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/">Edx basic linux commands course</a></li>
|
||||
<li><a href="https://courses.edx.org/courses/course-v1:RedHat+RH066x+2T2017/course/">Edx Red Hat Enterprise Linux Course</a></li>
|
||||
<li><a href="https://linuxcommand.org/lc3_learning_the_shell.php">https://linuxcommand.org/lc3_learning_the_shell.php</a></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
@@ -921,41 +868,6 @@ What is the output of running the pig queries in the right column against the da
|
||||
|
||||
<footer class="md-footer">
|
||||
|
||||
<div class="md-footer-nav">
|
||||
<nav class="md-footer-nav__inner md-grid" aria-label="Footer">
|
||||
|
||||
<a href="../evolution/" class="md-footer-nav__link md-footer-nav__link--prev" rel="prev">
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12z"/></svg>
|
||||
</div>
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Previous
|
||||
</span>
|
||||
Evolution of Hadoop
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
|
||||
<a href="../tasks/" class="md-footer-nav__link md-footer-nav__link--next" rel="next">
|
||||
<div class="md-footer-nav__title">
|
||||
<div class="md-ellipsis">
|
||||
<span class="md-footer-nav__direction">
|
||||
Next
|
||||
</span>
|
||||
Tasks and conclusion
|
||||
</div>
|
||||
</div>
|
||||
<div class="md-footer-nav__button md-icon">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M4 11v2h12l-5.5 5.5 1.42 1.42L19.84 12l-7.92-7.92L10.5 5.5 16 11H4z"/></svg>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
</nav>
|
||||
</div>
|
||||
|
||||
<div class="md-footer-meta md-typeset">
|
||||
<div class="md-footer-meta__inner md-grid">
|
||||
<div class="md-footer-copyright">
|
||||
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 16 KiB |
|
Before Width: | Height: | Size: 134 KiB After Width: | Height: | Size: 66 KiB |
|
Before Width: | Height: | Size: 251 KiB After Width: | Height: | Size: 91 KiB |
|
Before Width: | Height: | Size: 192 KiB After Width: | Height: | Size: 89 KiB |
|
Before Width: | Height: | Size: 161 KiB After Width: | Height: | Size: 74 KiB |
|
Before Width: | Height: | Size: 301 KiB After Width: | Height: | Size: 117 KiB |
|
Before Width: | Height: | Size: 67 KiB After Width: | Height: | Size: 33 KiB |
|
Before Width: | Height: | Size: 178 KiB After Width: | Height: | Size: 85 KiB |
|
Before Width: | Height: | Size: 306 KiB After Width: | Height: | Size: 134 KiB |
|
Before Width: | Height: | Size: 375 KiB After Width: | Height: | Size: 171 KiB |
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 24 KiB |
|
Before Width: | Height: | Size: 332 KiB After Width: | Height: | Size: 145 KiB |
|
Before Width: | Height: | Size: 101 KiB After Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 38 KiB |
|
Before Width: | Height: | Size: 119 KiB After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 70 KiB After Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 120 KiB After Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 373 KiB After Width: | Height: | Size: 171 KiB |
|
Before Width: | Height: | Size: 72 KiB After Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 40 KiB After Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 48 KiB |
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 178 KiB After Width: | Height: | Size: 83 KiB |
|
Before Width: | Height: | Size: 295 KiB After Width: | Height: | Size: 130 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 192 KiB After Width: | Height: | Size: 83 KiB |
|
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 89 KiB After Width: | Height: | Size: 43 KiB |
|
Before Width: | Height: | Size: 242 KiB After Width: | Height: | Size: 76 KiB |
|
Before Width: | Height: | Size: 245 KiB After Width: | Height: | Size: 91 KiB |
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 20 KiB |
|
Before Width: | Height: | Size: 295 KiB After Width: | Height: | Size: 133 KiB |
|
Before Width: | Height: | Size: 171 KiB After Width: | Height: | Size: 75 KiB |
|
Before Width: | Height: | Size: 168 KiB After Width: | Height: | Size: 78 KiB |
|
Before Width: | Height: | Size: 53 KiB After Width: | Height: | Size: 17 KiB |
|
Before Width: | Height: | Size: 243 KiB After Width: | Height: | Size: 94 KiB |
|
Before Width: | Height: | Size: 186 KiB After Width: | Height: | Size: 83 KiB |
|
Before Width: | Height: | Size: 217 KiB After Width: | Height: | Size: 98 KiB |
|
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 38 KiB |
|
Before Width: | Height: | Size: 179 KiB After Width: | Height: | Size: 80 KiB |
|
Before Width: | Height: | Size: 71 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 268 KiB After Width: | Height: | Size: 113 KiB |
|
Before Width: | Height: | Size: 95 KiB After Width: | Height: | Size: 42 KiB |
|
Before Width: | Height: | Size: 141 KiB After Width: | Height: | Size: 67 KiB |
|
Before Width: | Height: | Size: 288 KiB After Width: | Height: | Size: 126 KiB |
|
Before Width: | Height: | Size: 302 KiB After Width: | Height: | Size: 135 KiB |
|
Before Width: | Height: | Size: 628 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 233 KiB After Width: | Height: | Size: 107 KiB |
|
Before Width: | Height: | Size: 160 KiB After Width: | Height: | Size: 115 KiB |
|
Before Width: | Height: | Size: 164 KiB After Width: | Height: | Size: 109 KiB |
|
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 45 KiB |
|
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 64 KiB After Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 131 KiB After Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 50 KiB |
|
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 90 KiB |
|
Before Width: | Height: | Size: 56 KiB After Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 16 KiB After Width: | Height: | Size: 57 KiB |
|
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 91 KiB |
|
Before Width: | Height: | Size: 246 KiB After Width: | Height: | Size: 110 KiB |
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 8.3 KiB |
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 87 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 49 KiB |
|
Before Width: | Height: | Size: 148 KiB After Width: | Height: | Size: 69 KiB |
|
Before Width: | Height: | Size: 27 KiB After Width: | Height: | Size: 43 KiB |
|
Before Width: | Height: | Size: 353 KiB After Width: | Height: | Size: 120 KiB |
|
Before Width: | Height: | Size: 216 KiB After Width: | Height: | Size: 99 KiB |
|
Before Width: | Height: | Size: 40 KiB After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 147 KiB After Width: | Height: | Size: 72 KiB |
|
Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 20 KiB |