Tasks and conclusion

Post training tasks:

  1. Try setting up your own 3 node hadoop cluster.
    1. A VM based solution can be found here
  2. Write a simple spark/MR job of your choice and understand how to generate analytics from data.
    1. Sample dataset can be found here

References:

  1. Hadoop documentation
  2. HDFS Architecture
  3. YARN Architecture
  4. Google GFS paper