Vagrant : An essential tool for the experimental developer

Nowadays, there are so many technologies that peak my interest, it's hard to find the hours in the day to have fun and tinker with them all. Whether it's a new language, a framework, a tool, library, add-on, platform- there's not enough time and there isn't enough environments to keep everything clean and organized. I probably play with a few new technologies a week[end] and my old Mac can get a little disorganized in the chaos. Well, Vagrant doesn't solve the time issue, completely, but it does solve some of the organizational problems and speed developer life up a little bit.

Vagrant, in a nutshell, is best described as a wrapper around virtual machine platforms. But, you break the shell, and you discover it's wonderfully powerful and extremely flexible. You can easily install the Vagrant software, type three commands, and be at the command prompt of a brand new environment. You can install whatever software you want, play around, log out, and destroy it. Up, Play, Destroy. Done.

Let's go through the commands to go from nothing to a running Ubuntu box.

  1. Download Vagrant and install it.

  2. Download, if you do not have one already, a virtual machine platform. I recommend VirtualBox because playing around with free software is better than buying something and deciding you'd rather build machines from diodes. If you decide virtual computing is fun (who wouldn't?), and VirtualBox isn't cutting it, you can pay for one of the other ones.

  3. Open a terminal or command line and navigate to a place where you'd like to start building some machines. When you are there, run these commands:

vagrant init hashicorp/precise32  
vagrant up  
vagrant ssh  

At this point, you should be sitting at an Ubuntu prompt. Yes, you just downloaded, launched and access a brand new environment. Well, that's all fine and good, but what was all that 'speeding things up' talk? Go ahead and log out and we'll talk about that next.

Getting environments up and running is the first half of the greatness of Vagrant. The second half is the simple, yet powerful and easy configuration of your Vagrant machines via the Vagrantfile, which was created when you ran vagrant init...

If you open your Vagrant file, you'll see a few lines of text that isn't commented out, and a load of comments explaining all kinds of options you can configure for your environment(s). Lets take a look at a sample configuration:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|  
  config.vm.box = "hashicorp/precise32"
  config.vm.network "public_network"
  config.vm.hostname = "snhadoop.local"
end  

In this example, I have configured a new VM box of type hashicorp/precise32 (there are lots of images out there, or you can make your own!), which accesses my public network and has a hostname of 'snhadoop.local'. In this one bit of code I configured a machine and its networking. 

You can expand on this further to add in other custom networking directives such as port forwarding, private networks, etc.. 

Lets say you wanted to stand up a cluster of machines in one configuration for a hadoop cluster? How would you do that in a Vagrantfile?

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| 

 config.vm.define "namenode" do |namenode|
    namenode.vm.box = "hashicorp/precise32"
    namenode.vm.network :public_network, ip: '192.168.0.117'
    namenode.vm.hostname = "nnhadoop.local"
  end

  config.vm.define "secondary_namenode" do |secondary_namenode|
    secondary_namenode.vm.box = "hashicorp/precise32"
    secondary_namenode.vm.hostname = "snnhadoop.local"
    secondary_namenode.vm.network :public_network, ip: '192.168.0.118'
  end

  config.vm.define "datanode_1" do |datanode_1|
    datanode_1.vm.box = "hashicorp/precise32"
    datanode_1.vm.hostname = "dn1hadoop.local"
    datanode_1.vm.network :public_network, ip: '192.168.0.119'
  end

  config.vm.define "datanode_2" do |datanode_2|
    datanode_2.vm.box = "hashicorp/precise32"
    datanode_2.vm.hostname = "dn2hadoop.local"
    datanode_2.vm.network :public_network, ip: '192.168.0.120'
  end

  config.vm.define "datanode_3" do |datanode_3|
    datanode_3.vm.box = "hashicorp/precise32"
    datanode_3.vm.hostname = "dn3hadoop.local"
    datanode_3.vm.network :public_network, ip: '192.168.0.121'
  end
end  

This configuration will create five machines, specify their network, and set their IP and hostname. This particular configuration is used in a hadoop cluster. Nothing too complicated, but very convenient to have this environment ready to go at the press of a couple commands. You can also throw that Vagrant file in your git repo to keep it under [change] control.

So you built some machines and got them up and running. What about automagically installing software on them, or bootstrapping events or configuration? Vagrant has you covered there too:

config.vm.provision :shell, :path => "bootstrap.sh"  

By adding this configuration to each environment (or globally if you choose), and then creating the bootstrap.sh file in the same directory as the Vagrantfile, the script you define in bootstrap will execute locally within the machine and do whatever you want it to do. The script could install, for example, a specific JDK, a specific version of Hadoop, and git. Then it could grab the configuration files for your hadoop install from git, install those, spin up the cluster and voila- a 5 node hadoop cluster, fully configured, ready to consume some jobs.

So, after setting up a cluster and configuring the bootstrapping, you have a fully configured solution ready with which to play. And when you are done, how do you stop these machines?

vagrant --help  

From the help, you'll discover you can halt machines, destroy machines, get status of machines, and do a host of other helpful things with your machines or with new machines. 

I encourage you to take Vagrant for a spin. The documentation is fantastic and the developer community that leverages Vagrant is equally awesome. 

Thanks for reading and happy initing!

[email protected] [email protected]