CompanyJuly 6, 2022

Apache Cassandra 4.1: Building the Database Your Kids Will Use

Apache Cassandra 4.1: Building the Database Your Kids Will Use

Apache Cassandra® 4.1 will be released sometime in July, so now’s a good time to look back over the last year and reflect on where we are as a project. We committed to a yearly cadence, and guess what? We kept the promise!

If you haven’t been watching the Cassandra community for a while, it’s been a busy place. I know many of you have Cassandra clusters you installed years ago and completely forgot about.

By nature, a good database should be boring and forgettable—something that’s always there and ready. So we invite you to come back and see what’s new. Tell us about your absolutely boring database. Maybe an upgrade is in your future? Let me tell you what’s coming.

Stability as a feature

Cassandra 4.0 marked a significant milestone for a 10-year-old database. The project put enormous resources into building tools that define and validate a stable database and integrate those into the build and release pipeline to ensure thorough testing of future changes and prevent regression.

The result was one of the most stable databases ever shipped. Teams have been deploying 4.0 clusters over the past year at a furious pace, and the reports so far are overwhelmingly positive. Stability is a killer feature for a database, and if the world depends on Cassandra for storing critical data, we’re delighted that it’s living up to this promise.

Do you know another killer feature of a database? Shipping new features. There was a regrettable length of time between 3.11 and 4.0, but with a solid foundation built, we’re now moving at a regular cadence. Cassandra 4.1 is currently in pre-release, on track with our goal for major yearly releases. In addition, all the validation work done in 4.0 is paying off with a build pipeline that gives contributors confidence when building new functionality in Cassandra.

4.1’s shiny new thing: pluggability

So, what do we get out of this database and building on a stable core? The theme for Cassandra 4.1 is enabling feature plug-ins. Why are plug-ins a valuable new thing, you ask? It’s the structured way to add features to an existing product without changing the core code. For Cassandra, that means adding important new features without actually changing Cassandra.

One of the early drivers of this idea was Instagram. They had built a version of Cassandra that used RocksDB as the underlying storage engine, called Rocksandra. It radically changed how we thought of storage with Cassandra without changing the networking and node coordination. However, it also surfaced two distinct problems that needed addressing. First is the need for a clear interface to Cassandra internals, so there’s an understood contract between them when using outside code. The Rocksandra team had to rely on deep knowledge of Cassandra internals to make it work.

The second was having a reliable testing framework. Instagram understood its use case and had its own acceptance testing. For a general-purpose database, however, there needs to be a much wider scope of functional testing. Based in part on those lessons, both of those problems have been tackled in the project.

Some of the new plug-in features that are available in 4.1 include:

  • Storage. The feature talks about memtables, but those translate into the underlying storage because they’re mapped to be written and read by Cassandra. Expect to see some interesting implementations focusing on specific use cases, including fast memory storage and columnar formats.
  • Network Encryption. Previous to this change, any SSL certificates had to be in the local file system. This change allows external key providers such as HashiCorp Vault to make key management easier in large deployments.
  • Authentication. External and centralized authentication is a desirable feature for most organizations that manage many infrastructures. This change allows the Cassandra command-line tool, CQLSH, to use LDAP, Kerberos, and others.
  • Schema. The only option to store cluster schema is in system tables. For global coordination, especially in Kubernetes, external schema storage such as etcd is now an option.
  • Guardrails. Operators around the world rejoice! You now can restrict anti-patterns in your production environment. An example would be limiting the number of indexes you can add to a table. Guardrails is already in use with DataStax Astra DB, our managed Cassandra service, which donated to the project.

And now, the future! ACID transactions and more.

The future of Cassandra after 4.1 will be 5.0, and those discussions are already in full swing. Cassandra is the database designed for growth and for cloud-native applications in years to come. The following 10 years of Apache Cassandra are about rising to that challenge by building on the solid foundation of the previous 10 years.

The most transformative new 5.0 Astra DB feature may shock you, but here we go: Cassandra will be adding full ACID transactions.

If you’re in the camp of people saying you could never use Cassandra because it “doesn’t support transactions,” prepare to make room for more Cassandra in your life. Cassandra 2.0 added what we called “lightweight transactions” based on Paxos

To keep the guarantees that Cassandra delivers while not destroying performance, Paxos was the right choice at the time. However, since 2013, new consensus protocols have emerged, such as Spanner and Raft, which are very popular in the database world — but they require tradeoffs that aren’t aligned with the linear scale and uptime guarantees that Cassandra users expect.

A protocol known as Accord describes the next generation of a distributed consensus protocol, which does consensus in one round trip and doesn’t require complicated leader failover mechanisms. In short, it enables Cassandra to be Cassandra while delivering full ACID transactions. We’re building it now, so if you want to join in the conversation, we’d appreciate your input.

A more nonspecific but important direction for Cassandra is the move to being more cloud-native. The ability to add plugins and shape the deployment of your Cassandra cluster is a step in the right direction. Serverless databases in Kubernetes are the future, and Cassandra is well-positioned to be that database for years to come. DataStax released a white paper on a cloud-native database based on Cassandra that could be a blueprint for the future.

For end-users, it means scaling without manual effort and true multi-tenancy where multiple applications share the same infrastructure. For operators, especially site reliability engineers, the native Kubernetes deployments separate compute from storage to scale independently. Most important, you only provision what you need asynchronously to save on those ever-growing infrastructure bills.

Time to celebrate

Last year, we had a party for the release of 4.0, and this year will be no different! The Cassandra World Party is a one-hour, online-only event that we’ll hold at three different times on July 20 to cover as many time zones as possible. The highlight will be the five-minute lightning talk format where the user community can get on and tell their story… quickly! Make sure you register to save your spot; don’t miss out!

We hope you’ll join us and celebrate with the rest of the community. We’d love to hear your story—if you’re up to the challenge. Just tell us your topic and which time zone works for you. See you there!

Resources

  1. Apache Cassandra
  2. Cassandra World Party
  3. The End of the Beginning for Apache Cassandra
  4. Instagram Supercharges Cassandra with a Pluggable RocksDB Storage Engine
  5. Astra DB
  6. Paxos
  7. Spanner
  8. Raft
  9. DataStax Astra DB: Designing a Serverless Cloud-Native Database-as-a-Service

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.