Disruptive technology is a phrase coined by an HBS professor to describe a new technology that “unexpectedly displaces an established technology”. With the surge of open source technology and new startups pushing the boundaries of creative & innovative solutions, one trend that is hard to ignore is in the area of databases. Entrepreneurs who are chasing funding rounds from VCs and angel investors work hard to develop MVPs and provide validation against a unique solution. When building that new solution requires an application with a beautiful front-end, one area that is usually ignored (until very late in the game) is the back-end (DBAs know this all too well).
While it’s true that I have a past history with Oracle technologies (I worked there in the late 90s and early 00s), one cannot ignore the massive growth and benefits of NoSQL (“not only SQL”) database technology. While Oracle would love to solve all of your problems (usually with a price), NoSQL was introduced as a simpler (and cost efficient) solution for data management. NoSQL was first introduced in 1998 and most recently has gained the support of many startups and developers who need to easily process large amounts of unstructured data (that does not require a mortgage). NoSQL databases are very popular with big data and real-time web applications and based on recent trends are also taking very large market share from traditional relational database solutions.
Several popular flavors of NoSQL include MongoDB (document store), Cassandra / HBase (Column) and DynamoDB (Key-Value). Key-Value store provides the simplest data model structure where data is indexed by a key (hash and/or range). This allows very fast read/write operations. Document stores (also known as document-oriented databases) use a “schema-free” organization, where data does not need a uniform structure (for example as JSON objects). Document stores provide the same benefits as key-value store and you’re not limited to only querying by key. Other types of NoSQL databases include graph, object and column-oriented, so your options are wide open based on your data/application needs.
Recently, I was given the opportunity to tackle a unique issue where data was pulled from various sources and the requirement was to build a consolidated solution to support millions of records (with all the bells and whistles of availability, scalability, reliability and oh yeah performance). Forgoing my initial urge to push Oracle (it takes a while to wear off), we evaluated the following solutions based on our requirements: mongoDB, MySQL, Cassandra, and PostgreSQL (our various source Dbs). In the end, we decided to go down another path: DynamoDB. Below are a few points on why we felt DynamoDB was more compelling than its peers:
Ease of Use: With our customer adopting AWS as its cloud solution, it was easy to look “in house” rather than another configuration. AWS is a hosted environment, so DynamoDB offers a NoSQL solution where you do not have to worry about DBA tasks such as installation, maintenance, hardware provisioning, patching, etc. In addition, DynamoDB access via API calls, console, SDKs, etc is fantastic.
Availability Scalability and Performance: As with other AWS technologies, DynamoDB has the benefits of the AWS availability (data is synchronously replicated across multiple availability zones), scalability (apps can utilize elastic load balancing for scaling needs and accessing more space is never an issue) and performance (built off solid state drives SSDs for very fast read/write ops and low latency, plus you can increase the throughput if needed).
Backups: You can configure full or incremental backups using AWS Data Pipeline.
Security: Using AWS IAM policies you can set up permissions to control who can access specific DynamoDB resources as well at the database level to control access to items (rows) and attributes (columns), otherwise known as fine grain access control (no that’s not specific to only Oracle).
Integration of other AWS tools: Integration of CloudWatch is fantastic for your basic monitoring needs and Elastic Map Reduce is great to handle any unstructured data needs. In addition CloudSearch is a full text indexing solution for DynamoDB.
Cost: The pay for what you use model is brilliant. Your costs scale in direct proportion to your demand. No matter if you’re building a large enterprise application or an MVP for a new start-up the cost model is same. If this does not keep the folks at Oracle up at night it should.
With all the positive aspects, we did identify some issues with DynamoDB (does not support/handle altering indexes after table creation, various limitations w/ DynamoDB, advanced data type support, advanced querying support, etc). We acknowledge that DynamoDB is very young and has flaws, however it does have tremendous upside as Amazon rolls out new features to extend its capability and support within the AWS framework.