NoSQL databases: A comparison of the top contenders
NoSQL is a database technology designed to support the requirements of cloud applications and to overcome the scale, performance, data model, and data distribution limitations of relational databases (RDBMSs). But there’s much more to the decision than just choosing between NoSQL and relational databases.
NoSQL databases come in many flavors, with several leaders sporting unique characteristics. We’ll cover the top contenders—Apache Cassandra®, MongoDB, Apache HBase™, and Couchbase—across four key factors, to help you decide which is the best fit for your organization. But, first, to make sure we’re on the same page, let’s review NoSQL databases, their benefits, and how they differ from relational databases.
What is NoSQL?
NoSQL (not-only-SQL) databases are designed to store, distribute, and access data using methods that differ from relational databases (RDBMSs). NoSQL technology was originally created and used by internet leaders such as Facebook, Google, Amazon, and others who required database management systems that could write and read data anywhere in the world, while scaling and delivering performance across massive data sets and millions of users. Read more about NoSQL’s evolution.
Today, almost every organization must deliver cloud applications that personalize interactions with their customers and NoSQL is often the database technology of choice for powering such systems.
Compared to traditional, relational databases, NoSQL could be the way to go if any of these apply to your situation: you have a large volume and variety of data, scalability is a top priority, you need continuous availability, you’re working with big data, or you’re performing real-time analytics.
NoSQL databases have proven to be a good fit for many real-world use cases, including fraud detection, identity authentication, inventory management, personalization, IoT, financial services, payments, messaging, and many more.
Learn more about NoSQL at DataStax Academy
Benefits of NoSQL
NoSQL databases are primarily designed for supporting decentralized systems that target cloud applications. While some myths persist, the fact is NoSQL databases, like Cassandra, typically offer the following benefits over other database management systems:
- Continuous availability: Database stays online even in the face of the most devastating infrastructure outages.
- Geographically distributed: Fully active data, everywhere you need it.
- Operationally low latency: Response times fast enough for your most intense operational cloud applications.
- Linearly scalable: Predictably scale to meet the current and future data needs of cloud applications.
- Immediately decisive: Full range of data manipulation capabilities tightly integrated into a single system.
- Functionally cohesive: Coherent integration and interoperability of mixed workloads and multiple data models.
- Operationally mature: Enterprise-ready data management for cloud applications.
- Low total cost of ownership: No requirements for specialized hardware or ancillary software.
NoSQL vs. Relational databases
Relational databases, also known as relational database management systems (RDBMSs), have a rich history dating all the way back to 1970. They are appropriately named because they are used to store data points that are related to each other. Relational databases store data in tables. Each table is made up of rows and columns, where each row is a record with a unique ID called a key. Each column holds attributes for its corresponding row or record.
Adopting this highly structured approach makes it easy to connect the dots and see relationships between data points. Structured Query Language, or SQL, was developed soon after the advent of relational databases. It provided straightforward and easy-to-learn commands to access and modify relational databases. SQL quickly boosted the popularity of relational databases and they went on to become the most widely used database management system. Today, most organizations with data management needs still use relational databases, though many use them in tandem with NoSQL databases.
How NoSQL differs from relational databases
NoSQL and relational databases have some major differences. The crucial decision on which to implement may very well come down to your company’s variety of data, plans for growth, and goals. If you’re not sure which type of database is right for your organization, take a few minutes to read this in-depth comparison of the pros and cons of each.
NoSQL and RDBMSs are designed to support different application requirements and typically co-exist in most enterprises. The key decision points on when to use which include the following:
Use an RDBMS when you need/have... |
Use NoSQL when you need/have... |
Centralized applications (e.g. ERP) |
Decentralized applications (e.g. Web, mobile and IoT) |
Moderate to high availability |
Continuous availability; no downtime |
Moderate velocity data |
High-velocity data (devices, sensors, etc.) |
Data coming in from one/few locations |
Data coming in from many locations |
Primarily structured data |
Structured, with semi/unstructured |
Complex/nested transactions |
Simple transactions |
Primary concern is scaling reads |
Concern is to scale both writes and reads |
Philosophy of scaling up for more users/data |
Philosophy of scaling out for more users/data |
To maintain moderate data volumes with purge |
To maintain high data volumes; retain forever |
If you’re already using a relational database and considering making the move to NoSQL, we have some good news. It’s easier than you might think.
How to compare different NoSQL databases
Once a decision has been made to go with NoSQL, a whole new set of things to consider opens up. There are several popular NoSQL databases and each makes its case for superiority. While they are all NoSQL, they vary in significant ways—differences that may or may not make them the right fit for your company.
First, make sure any option you’re considering handles core NoSQL features. Next, we recommend a close review of the following factors:
- Architecture: Some NoSQL databases like MongoDB are architected in a master/slave model in somewhat the same way as many RDBMSs. Others, like Cassandra, are designed in a masterless fashion where all nodes in a database cluster are the same. The architecture of a NoSQL database greatly impacts how well the database supports requirements such as constant uptime, multi-geography data replication, predictable performance, and more.
- Data model: NoSQL databases are often classified by the data model they support. Some support a wide-row tabular store, while others employ a model that is either document-oriented, key-value, or graph.
- Data distribution model: Because of their architecture differences, NoSQL databases differ on how they support the reading, writing, and distribution of data. Some NoSQL platforms like Cassandra support writes and reads on every node in a cluster and can replicate / synchronize data between many data centers and cloud providers.
- Development Model: NoSQL databases differ on their development APIs with some supporting SQL-like languages (e.g. Cassandra’s CQL).
Cassandra vs. MongoDB vs. HBase vs. Couchbase
Cassandra, MongoDB, HBase, and Couchbase are four of the leading NoSQL databases. Let’s apply the evaluation criteria recommended above to see how they compare.
Cassandra |
MongoDB |
HBase |
Couchbase |
|
Architecture |
Fully distributed and masterless. All nodes in the cluster are the same. Ensures availability, with no downtime. Every node handles a proportionate share of every activity in the system. Avoids bottlenecks seen in systems with a single master node, enabling lower latency. 100% availability is guaranteed. If any particular node goes down, there is no downtime. |
Master/slave model. Single primary node directing multiple secondary nodes. If the primary node goes down, one of the secondary nodes takes over and that could take up to a minute. The database isn’t able to respond to requests during that time. |
Master/slave architecture. Single point of failure. Part of Hadoop ecosystem and stores data in HDFS. The reliance on HDFS, instead of locally-managed storage, makes this database complex and less performant.
|
Shared-nothing architecture. Every node is self-sufficient and independent. They don’t share memory or storage. To remove points of contention, each request is handled by a single node. Each node includes a data service, index service, query service, and cluster manager component. |
Data model |
Wide-column store based on ideas from Google Cloud Bigtable and Amazon DynamoDB. Uses a table structure consisting of rows and columns, but, unlike relational databases, each row is not required to have the same columns. When the table is created, the data type to be stored in each column has to be specified. |
Document store with JSON-like format. Schema is flexible and doesn’t have to be predefined. Documents are stored in collections. Collections accommodate many different data types and also allow the data to be nested. |
Wide-column store based on Apache Hadoop and Google’s Bigtable. Like Cassandra, it uses a table structure with rows and columns. However, while Cassandra supports hundreds of tables, HBase "does not do well with anything above two or three column families." |
Flexible data model that is a JSON-based document store, but maps data to a key/value storage API. Developed from CouchDB with a Memcached-compatible interface. |
Data distribution model |
All nodes in the cluster have the same role, so each one can write and read data. This boosts Cassandra’s writing capability. Write speed and scalability increase as more nodes are added. Updates are fast because only modified data is appended, with no data rewrites required. The database can coordinate all write operations at the same time. Data replication and synchronization across data centers and cloud providers is built in. Every node immediately shares its data with the others, making it instantly available on all nodes. This dissemination of data increases resilience and performance. |
With its single master architecture, only the primary node can write and accept input. Secondary, slave nodes can only be used for reads. This means only one write operation can be performed at a time and writing scalability is limited. Another limitation that can slow things down: The entire document must be rewritten when a field is added or updated. Replication is complex and fragile. |
As with MongoDB, if the master node is down, the entire cluster will be unavailable. HBase has an integrated, log-structured storage engine, but relies on HDFS for replication instead of managing storage locally. This puts limitations on write workload scalability. Everything is written in one place, giving a clear trail to each piece of data. Cassandra's replication design is inherently more suited for delivering low latency response times, while also tolerating failures better. |
Simpler replication than with MongoDB, but no more rigorous in its design. Couchbase is neither fully consistent, nor fully available. It cannot serve reads during failover or network partitions, but it can still serve stale data to reads. Couchbase nominally supports active/active cross-datacenter replication. However, if the same document is updated concurrently in both, one of the updates will be lost. And cross-datacenter replication failure often requires manual intervention to recover. |
Development model |
Focuses on developer productivity and features its own user-friendly query language. Called Cassandra Query Language (CQL) its syntax and statements are similar to SQL, making it easy for developers to pick it up. |
MongoDB doesn’t have its own query language. Queries are conducted using JSON fragments.
|
HBase also doesn’t have its own language. Add-on technologies are needed to run queries.
|
Uses the declarative query language N1QL, pronounced “nickel,” to modify JSON data. Unlike SQL, which returns results in a table format, N1QL uses SQL++ which provides results in JSON. |
Benchmarks run by End Point, an independent database firm, stress-tested Apache Cassandra, HBase, MongoDB, and Couchbase on operations typical to real-world applications. Results showed that Cassandra outperformed its NoSQL counterparts. In fact, for mixed operational and analytic workloads typical to modern web, mobile, and IoT applications, Cassandra performed six times faster than HBase and 195 times faster than MongoDB. Read the report.
Take the next NoSQL step with Cassandra
There’s a lot to take in and consider when comparing NoSQL database options. We hope our recommended decision criteria and the comparison of Cassandra, MongoDB, HBase, and Couchbase has been helpful. It’s certainly a difficult decision, with a lot to consider. But, at DataStax, we believe Cassandra is the clear winner and we’re betting on it being the database of the future.