Adding Big Data module

This commit is contained in:
Aditya Kamat
2020-11-05 16:33:17 +05:30
parent 86f04a3ead
commit a8e66f6160
13 changed files with 158 additions and 0 deletions

14
courses/big_data/tasks.md Normal file
View File

@@ -0,0 +1,14 @@
# Tasks and conclusion
## Post training tasks:
1. Try setting up your own 3 node hadoop cluster.
1. A VM based solution can be found [here](http://hortonworks.com/wp-content/uploads/2015/04/Import_on_VBox_4_07_2015.pdf)
2. Write a simple spark/MR job of your choice and understand how to generate analytics from data.
1. Sample dataset can be found [here](https://grouplens.org/datasets/movielens/)
## References:
1. [Hadoop documentation](http://hadoop.apache.org/docs/current/)
2. [HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)
3. [YARN Architecture](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
4. [Google GFS paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf)