GuideApr 17, 2024

Apache Cassandra - A NoSQL Database

Apache Cassandra® is the only distributed NoSQL database that delivers the always-on availability, blisteringly fast read-write performance, and unlimited linear scalability needed to meet the demands of successful modern applications.

Sign Up for Astra

What is Apache Cassandra?

Apache Cassandra is a distributed NoSQL database created at Facebook and later released as an open source project in July 2008.

Cassandra delivers the continuous availability (zero downtime), high performance, and linear scalability that modern applications require, while also offering operational simplicity and effortless replication across multiple data centers and geographies. It can handle petabytes of information and thousands of concurrent operations per second, enabling organizations to manage large amounts of structured data across hybrid and multi-cloud environments.

Apache Cassandra vs. traditional relational databases

Cassandra differs from a typical relational database in the following ways:

APACHE CASSANDRA RELATIONAL DATABASE
Handles high incoming data velocityHandles moderate incoming data velocity
Supports simple transactionsSupports complex/nested transactions
No single points of failure; constant uptimeSingle points of failure with failover
Supports very high data volumesSupports moderate data volumes
Decentralized deploymentsCentralized deployments
Data written in many locationsData written in mostly one location
Supports read and write scalabilitySupports read scalability (with consistency sacrifices)
Deployed in horizontal scale-out fashionDeployed in vertical scale-up fashion

Please check out our NoSQL primer, if you'd like to learn more about how Cassandra and other NoSQL databases compare to relational databases.

History of Apache Cassandra

Cassandra Version History

Cassandra was originally developed by Facebook to handle the massive amounts of data generated by the social networking platform. It was designed to be highly scalable and fault-tolerant, with a decentralized architecture. In 2008, Facebook open sourced Cassandra, releasing it under the Apache License. This move allowed other organizations to benefit from Cassandra's capabilities and contribute to its development.

In 2009, Cassandra became an Apache Incubator project, indicating its potential as an open source technology. During this time, the project received contributions from various developers and organizations, further enhancing its features and stability.

In February 2010, Cassandra graduated from the Apache Incubator and became a top-level Apache project. This milestone solidified Cassandra's position as a mature and widely adopted database solution. Over the years, Cassandra has seen numerous version releases, each introducing new features, improvements, and bug fixes.

Download our free Apache Cassandra timeline infographic to learn more about Cassandra’s history, and how it’s changed over time.

Apache Cassandra’s key features and benefits

Whether you need to process server logs, emails, social media posts, or PDFs, Cassandra has you covered. You’ll be able to make better-informed decisions without leaving any of your data on the table.

Here are some of Cassandra's key benefits and features:

Open source

Modern software development organizations have overwhelmingly moved to adopt open source technologies, starting with the Linux operating system and progressing to infrastructure for managing data. Open source technologies are attractive because of their affordability and extensibility, as well as the flexibility to avoid vendor lock-in. Organizations adopting open source report higher speed of innovation and faster adoption.

Want to run open source Cassandra with peace of mind? Learn about DataStax Luna, enterprise support for Apache Cassandra.

Flexible, familiar interface

The Cassandra Query Language (CQL) is similar to SQL. That means most developers should have a fairly easy time becoming familiar with it.

High performance

The majority of traditional relational databases feature a primary / secondary architecture. In these configurations, a single primary replica performs read and write operations, while secondary replicas are only able to perform read operations. Downsides to this architecture include increased latency, higher costs, and lower availability at scale. With Cassandra, no single node is in charge of replicating data across a cluster. Instead, every node is capable of performing all read and write operations. This improves performance and adds resiliency to the database.

Active everywhere with zero downtime

Since every Cassandra node is capable of performing read and write operations, data is quickly replicated across hybrid cloud environments and geographies. If a node fails, users will be automatically routed to the nearest healthy node, leaving no single point of failure. They won’t even notice that a node has been knocked offline because applications behave as designed even in the event of failure. As a result, applications are always available and data is always accessible and never lost. What’s more, Cassandra’s built-in repair services fix problems immediately when they occur — without any manual intervention. Productivity doesn’t need to take a hit if nodes fail.

Scalability

In traditional environments, scaling applications is a time-consuming and costly process typically accomplished by scaling vertically with more expensive machines. Cassandra enables you to scale horizontally by simply adding more nodes to the cluster. If, for example, four nodes can handle 200,000 transactions/second, eight nodes will be able to handle 400,000 transactions/second.

Seamless replication

Today’s leading enterprises are increasingly moving to multi-data center, hybrid cloud, and even multi-cloud deployments to take advantage of the strengths of each, without getting locked into any single provider’s ecosystem. By placing data on different machines, it proves easy data distribution with no single point of failure. Getting the most out of multi-cloud environments, however, starts with having an underlying cloud database that offers scalability, security, performance, and availability.

Understanding Cassandra’s Query Language (CQL)

Cassandra has its own query language, Cassandra Query Language (CQL). It’s a declarative language for communicating with Cassandra — developers use CQL commands to write and read data from the database.

CQL’s simple API provides an abstraction layer that keeps Cassandra’s internal storage structure and implementation details hidden. Native syntaxes for collections and other common encodings are also included.

Developers can interact with Cassandra using the CQL shell interface, cqlsh, on the command line of a node. The cqlsh prompt can be used to create and modify keyspaces and tables, make changes to data, insert and query tables, and more.

If you prefer a command line tool and use Astra DB, definitely take a look at the Astra CLI. It will install and configure CQLSH for you to work with your cloud-managed database service instance. You can also use it to manage, operate and configure your Astra DB instance via scripts or commands in a terminal.

If you prefer a graphical tool, definitely take a look at DataStax Studio. And language drivers are available for Java (JDBC), Python (DBAPI2), Node.JS (Datastax), Go (gocql), and C++. Take a look at all the drivers DataStax provides allowing CQL statements to be passed from client to cluster and back.

Get Started with Astra DB for Free

Rapidly build cloud-native applications with DataStax Astra DB, a database-as-a-service built on Apache Cassandra.

Comparing CQL vs SQL

As mentioned earlier, because of their many similarities, developers with SQL experience should be able to get to work fast in CQL.

Just like SQL, CQL stores data in tables containing rows and columns. Many interactions and implementations are the same, such as retrieving all the rows in a table, and how permissions and resources of entities are controlled. If you have a handle on SQL statements like SELECT, INSERT, UPDATE, and DELETE, CQL should be a snap.

That said, there are some differences. For example, in SQL any column can be included in the WHERE clause, whereas in CQL only columns that are strictly declared in the primary key can be used as a restricting column. Also, each query must have a partition key defined at a minimum in CQL.

Learn more about similarities and differences between SQL and CQL

CQL’s major data types

CQL comes with many built-in data types. Let’s review some of the key ones.

String types

Two data types are included to represent text. Either of the data types, text or varchar, can be used to create an UTF-8-character string—the more recent, commonly used text standard that supports interna. To handle legacy data in ASCII format, CQL also includes the ascii type.

Numeric types

Numeric types in CQL are similar to those found in Java and other languages. They include integers, decimals, and floating-point numbers.

Date and time types

CQL provides data types for dates, date ranges, time, timestamps, and duration (in months, days, and nanoseconds).

Geo-spatial types

Data types to handle spatial and geospatial searches are provided. They can be used to index and query latitude and longitude location data, as well as point and shape data.

Collection types

CQL supports collections — storing multiple values in a single column. Use collections to store or denormalize small amounts of data, such as phone numbers, tags, or addresses. Collections are not appropriate for data that is expected to grow unbounded, such as all events for a particular user. In those cases, use a table with clustering columns.

Other CQL data types

Several other data types are available to handle special situations. For example, CQL includes the blob type for storing binary data, boolean for true or false values, and counter to define counter columns. It also includes unique identifier types.

Learn more about CQL data types at the DataStax Docs

Get a feel for the cqlsh interface and use some of CQL’s essential commands, in a quick five-minute test drive. Try the CQL demo.

Take a look at this handy guide of the top 10 most frequently used CQL commands. 

Download the DataStax Quick Reference Guide for CQL

Where is Apache Cassandra headed next?

At DataStax, we’re working hard with the open-source community to build on Cassandra’s decade-plus maturity to solidify its position as the leading database for cloud-native applications.

Cassandra has traditionally been known as an extremely powerful database that stands up to the most demanding use cases, but also as difficult to learn and operate. DataStax is committed to working with the Cassandra community to make it easier to use, adopt, and extend for your needs.

Here are some of the ideas we’re exploring:

Adding more SQL-like capabilities into CQL:

Providing API compatibility for Dynamo DB with Stargate

Making the storage engine pluggable

How can I get started?

If you’d like to learn more about Apache Cassandra, we have several resources here to get you started:

Get Started with Astra DB for Free

Rapidly build cloud-native applications with DataStax Astra DB, a database-as-a-service built on Apache Cassandra.

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.