mirror of
https://github.com/linkedin/school-of-sre
synced 2026-01-07 17:18:03 +00:00
* docs: formatted for readability * docs: rephrased and added punctuation * docs: fix typos, punctuation, formatting * docs: fix typo and format * docs: fix caps and formatting * docs: fix punctuation and formatting * docs: capitalized SQL commands, fixed puntuation, formatting * docs: fix punctuation * docs: fix punctuation and formatting * docs: fix caps,punctuation and formatting * docs: fix links, punctuation, formatting * docs: fix code block formatting * docs: fix punctuation, indentation and formatting
15 lines
844 B
Markdown
15 lines
844 B
Markdown
# Tasks and conclusion
|
|
|
|
## Post-training tasks:
|
|
|
|
1. Try setting up your own three-node Hadoop cluster.
|
|
1. A VM-based solution can be found [here](http://hortonworks.com/wp-content/uploads/2015/04/Import_on_VBox_4_07_2015.pdf)
|
|
2. Write a simple Spark/MR job of your choice and understand how to generate analytics from data.
|
|
1. Sample dataset can be found [here](https://grouplens.org/datasets/movielens/)
|
|
|
|
## References:
|
|
1. [Hadoop documentation](http://hadoop.apache.org/docs/current/)
|
|
2. [HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)
|
|
3. [YARN Architecture](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
|
|
4. [Google GFS paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf)
|