CompanyMarch 10, 2022

Apache Cassandra Finds Its Groove

Jeffrey Carpenter
Jeffrey CarpenterSoftware Engineer - Stargate
Apache Cassandra Finds Its Groove

Setting the right pace of change for a technology can be a tricky thing. If it's changing too fast, it can cause churn and confusion for users as they try to keep up. On the other hand, if a technology is not actively demonstrating at least some level of change, it gets stagnant and loses momentum. 

We all evaluate open source projects according to the pace of change. Whether we like to admit it, there’s an assurance that comes from seeing that the latest commit on the Git repo occurred in the past few days instead of months or years ago. We use this as a proxy for evaluating the health of a project and its community.

I’ve been thinking about this pace of change lately as I had the opportunity to revisit my work on Cassandra: The Definitive Guide for O’Reilly to create a revised third edition of the book. I’d like to share a few interesting things here that I observed in that process.

A brief history of innovation on the Cassandra project

Apache Cassandra is a project that has had periods of change that were too fast and times of slow progress that caused concern, but is now approaching a great equilibrium.

Too fast

When Eben Hewitt first approached me in 2014 about updating the Cassandra book he had first written in 2010, he joked that O’Reilly should stop selling that first edition because of how out of date it was and how much Cassandra had changed. This turned out to be true – the addition of the Cassandra Query Language (CQL) and resulting architecture changes had a major impact on what it looked like to use and maintain Cassandra. As a result, that second edition (released in 2015) was a major rewrite. 

Too slow

When I had the chance to work on the third edition (released in 2020), the experience was quite different. The pace of change on the Cassandra project had slowed. On the plus side, this was an indicator that the project had reached a level of stability and maturity. On the minus side, it also showed a slowing pace of innovation and progress. The 4.0 release was in beta for a long time, longer than I expected. My attempt to time the third edition to coincide with the release of Cassandra 4.0 missed the mark as the book actually beat the 4.0 GA release by a full year!

Just right!

In working on the revised third edition that’s just been released, I had the pleasant experience of observing a great balance that has emerged in the Cassandra community. The pace of innovation on the core project is just right. Instead of features that make major API and functionality changes, the project is focused on features that enable ease of use and developer productivity. Probably the most important signs, though, are the developments in the community surrounding the project itself.

A vibrant community driving Cassandra’s future

Over the past couple of years, the Cassandra community has been revitalized in some major ways.

Innovation on the core through CEPs

The first is the introduction of the Cassandra Enhancement Proposal (CEP) process. Drawing inspiration from improvement processes in other communities, the Cassandra community now has a well-defined and well-executed process for proposing, discussing, and approving new ideas before engineers invest time in making changes. There are many of these proposals in progress. Here’s a few that I find especially interesting for application developers:

  • CEP-3: Guardrails will make it easier for administrators to place limitations on usage patterns that would degrade performance and stability of Cassandra clusters. 
  • CEP-15: General Purpose Transactions will provide a new transaction capability based on the Accord protocol that will provide transactions across multiple tables, a long requested feature.  
  • CEP-7: Storage Attached Index will add a new secondary index implementation that allows indexing of multiple columns while maintaining performance.

Innovation at the ecosystem edge

It’s not only within the Apache project that innovation occurs within the Cassandra community. Projects like Cassandra Reaper and Cassandra Medusa are great tools for operating Cassandra that emerged out of collaboration between Spotify and The Last Pickle. 

More recently a number of organizations have been deploying Cassandra on Kubernetes and building their own operators. Many of these organizations shared ideas and best practices, resulting in a lot of momentum behind the open-source operator started by DataStax that is now part of the K8ssandra project. 

K8ssandra is itself an example of a project that pulls many of these threads together, knitting Cass Operator, Reaper, Medusa, Prometheus, and the Stargate data gateway into an integrated stack for running Cassandra on Kubernetes as easily as possible. 

Communication for all

The Cassandra website has undergone a full refresh within the past year and features a continual feed of new content on the blog. For those that are active in the community (or want to be), the regular “Changelog” posts are especially useful summaries of what’s going on in the project.

Non-code contributors

Cassandra has long been a project that demanded a lot of technical expertise to work with the core engine, and the path to becoming a committer has been reserved for those with a pretty narrow set of specific skills in server-side Java development. I’m super excited to see how the project has recently begun to recognize committers like Lorina Poland, Erick Ramirez, and Anthony Grasso, whose primary contributions have been in the area of documentation and helping new users. 

Download the newest edition of Cassandra: The Definitive Guide for O’Reilly for the most up-to-date technical details and practical examples for deploying Apache Cassandra in a production environment.

Discover more
Apache Cassandra®Community
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.