Hadoop Intro Bootcamp

Following up on this week's "Hadoop Intro" bootcamp I want to wrap things up by sending out the location of the various files used during the bootcamp.

Part of the training was instructor led and covered the core components of Hadoop. User exercises were interspersed throughout the bootcamp to give users an opportunity for hands-on interaction with Hadoop/HDFS/MapReduce.

Tried something new with this bootcamp. The Oracle VirtualBox hypervisor was used to launch a Linux virtual machine containing Hadoop 1.0.2 installed/configured. I have received good feedback indicating folks liked this approach. Definitely leveled the playing field as folks came with laptops/macs running operating systems ranging from OS X, Linux, and various flavors of WIndows.

The bootcamp was designed assuming no previous knowledge of Hadoop. For folks not able to attend the training, but are interested in learning more, you should be able to run through the content yourself.

Try these steps:

  1. Review the bootcamp prerequisites. You'll need VirtualBox and an ssh client installed.

  2. Download the virtual disk for the Linux VM containing Hadoop. The virtual disk image is almost 5GB in size. Recommend using a wired connection.

    Recommend downloading the md5sum checksum file to verify the integrity of the virtual disk image.

    Place both the hadoop1.vmdk and hadoop1.vmdk.md5sum files in the same directory and run the md5sum command to verify the integrity of the .vmdk file. e.g.

    md5sum -c hadoop1.vmdk.md5sum
    
  3. Download the hands-on exercises. Run through the first exercise to create a virtual machine in VirtualBox using the virtual disk .vmdk file and set up some networking options used for later exercises.

  4. Review the Powerpoint slides covering HDFS then run through exercise 2. Review the remainder of the Powerpoint slides covering MapReduce then run through exercises 3-4. Exercise 5 will give you a jumping off point to create your own MapReduce programs if interested.

    The slides are available in both Powerpoint and pdf formats.

If you prefer instructor led training let me know and I can work on setting that up. E-mail me at [email protected]