Exploring DynamoDB with Ruby and the Amazon Ruby SDK

DynamoDB quick intro:

DynamoDB is Amazon's latest attempt at making it easier to manage, scale, and leverage NoSQL data sources. DynamoDB manages the heavy lifting tasks of auto scaling, guaranteeing throughput, and meeting the needs of your data with a highly flexible data model where each data item may have its own attributes. The principles behind DynamoDB's architecture underpin a lot of the AWS service offerings (eventual consistancy across zones), so it was natural for them to expose this 'database' as an additional service to be leveraged by their map reduce service, ElasticMapReduce (EMR).

So, naturally, why not leverage this very powerful service with the ease of Ruby? Fortunately, Amazon provides a great SDK for just this task (and many others). Lets look at how you can leverage the aws-sdk gem to quickly stand up a DynamoDB table, insert some data and update that data.

  1. Install the gem using your preferred method. Bundler makes it easy, so why not start there by either creating a Gemfile or adding the following line to your rails app Gemfile: gem 'aws-sdk'. Don't forget to run 'bundle install'
  2. For any of APIs provided by the gem, you are required to provide your AWS IAM credentials when accessing the services. Depending on how you architect your system, you will either hardcode your credentials (bad), or use environment provided credentials (better!). The decision on where to store credentials, the capabilities of those credentials, and the use of the Amazon Secure Token Service is a big decision and outside the scope of this blog. You will want to make sure the credentials you use in this exercise have the ability to create DynamoDB tables.

The Ruby Code:

Gemfile:

gem 'aws-sdk'  

# in an initilizer, environment file, etc..

AWS.config(  
   access_key_id: '<AWS ACCESS KEY ID>', 
   secret_access_key: '<SECRET ACCESS KEY>', 
   region: '<REGION>'
)

# instantiate a dynamodb client and pick a table name

dynamo_db = DynamoDB.new  
table_name = 'Users'  

# create the table

users = dynamodb.tables.create(  
  table_name, 5, 6, 
  :hash_key => {:username => :string}, 
  :range_key => {:email => :string}
)

Explaination of the parameters used in the table create call:

  • table_name - The name of the table.
  • 5 - The number of reads per second of data up to 4Kb in size.
  • 6 - The number of writes per second of data up to 1Kb in size.
  • hash_key - The 'primary' key of the table, used by DDB to build unordered hash of table contents.
  • range_key - The 'range' used to build an ordered index of the table contents.

In this case, I created a Users table with a primary/hash key of a username and a range of the user's email address. The reads/writes per second specification is important decision. Based on the values passed at table creation, Amazon can be sure to provision the appropriate resources to guarantee the service and performance you are expecting. For more information on the implications of decisions made during the DynamoDB creation process, check out the Amazon Developer Guide - Working with Tables

Creation of most Amazon resources is not instantaneous. DynamoDB is one of those resources that needs a little time to initialize. Because of this, you have to design your code defensively to account for the lag between when you create the table programmatically, and when you write your first object to that table.

# lets wait while the table is created and get a little feedback during the process

print "waiting for table #{table_name} to become active.."; $stdout.flush  
sleep 1;print '.'; $stdout.flush while users.status == :creating  
puts 'ready!';  

# once this code finishes, it's time to put load a record into the table

item = users.items.create({:username => 'jdoe', :email => [email protected]'})  

# lets update this item with a few new attributes (attributes can be added on the fly, as needed. not all records would have a value for an attribute unless updated after the fact).

item.attributes.update do |u|  
  u.set 'first' => 'john'
  u.set 'last' => 'doe'
  u.set 'years_of_service' => 10
end  

# dump the table contents to the console

users.items.each do |item|  
   attribs = item.attributes
   puts "#{attribs['username']} - #{attribs['email']} - #{attribs['first']} - #{attribs['last']} - #{attribs['years_of_service'].to_i}"
end  

# lastly, lets clean up the sample table by deleting it and providing a little feedback (yes, it's that easy)

print "deleting table: #{users.name}"  
users.delete  
begin  
   while users.status == :deleting
     sleep 1
     print '.'; $stdout.flush
end  
rescue Exception => e  
   if e.is_a?(ResourceNotFoundException)
     print 'poof, gone!'; $stdout.flush; puts ''
   else
     # something else bad happened here. 
   end
end  

The deletion code acts similar to the creation code, monitoring the status of the delete call. When the delete is successful, the resource is gone.

If you ran this code, you should see something similar to this:

waiting for users to become active.........ready!  
jdoe - [email protected] - john - doe - 10  
deleting table: Users.....................poof, gone!  

Although this was a contrived example, it does reflect the simplicity of using Ruby code to leverage the power of the AWS service offerings. You could spin up a DynamoDB instance, load it with log files, and quickly analyze those within an EMR instance with the ease of a simple script.

RubyAWS SDK Documentation

Questions? Just drop me a line at [email protected] or @leechris