CompanyDecember 13, 2022

It’s time to upgrade to Cassandra 4.1

It’s time to upgrade to Cassandra 4.1

Database infrastructure, particularly for security and operations minded team members, shouldn’t actually be very exciting. In fact, it should be as boring as possible, particularly for a database that’s been powering massive scale infrastructure for over a decade. Let’s save the excitement for less manual toil, accidents averted before they ever happen, better observability, and new tools that can help keep your production cluster up and running. 

Last year’s Apache Cassandra 4.0 release was possibly the most stable database release in history. This  stability is an important foundation as Cassandra begins a new cycle of innovation. And while Cassandra 4.1 is delivering on that innovation today, the 5.0 discussion is well underway.  Full ACID transactions, relational-style secondary indexes, and other landmark features are in development.  Which makes it all the more reassuring that the commitment from the Cassandra community to maintain that hard-won stability is strong. 

Cassandra 4.1 ships ready-for-your-production clusters to upgrade. Let’s briefly look into why an upgrade could be compelling for your cluster or project. For a deep dive, watch the on-demand webinar with Jeff Carpenter, the co-author of “Cassandra: The Definitive Guide - 3rd Edition.”

Security and Operations

Operating open source Cassandra can be challenging, as is securing it.  The 4.1 release ships new tools and capabilities to help you not just keep your cluster up and running, but keep it humming. One such example is embodied by CEP-10: Cluster and Code Simulations allows for simulating operations on clusters. Operations are repeatable but pseudo-random, allowing the exploration of state space and correctness testing. This new infrastructure advances Cassandra’s state-of-the-art distributed systems testing in service of delivering higher quality open source software with fewer defects.

  • Operators will appreciate CEP-3: Guardrails, disabled by default, but exist to prevent common anti-patterns. They are configured in cassandra.yaml, or via JMX at runtime. Warnings/Failures are logged and sent to driver clients when applicable. Learn more on the ASF Cassandra blog. This is just the beginning, lots more to come!
  • CEP-13: Denylisting gives Cassandra operators a new tool to help reduce the effect of overloaded partition keys on nodes (and the cluster).  Operators can isolate specific partition keys that are read/write overloaded, saturated with live cells/tombstones, or experiencing DDOS attacks in production clusters.
  • Do you use lightweight transactions (LWTs)?  CEP-14 upgrades the Paxos implementation to v2, which in turn, upgrades LWTs transaction performance by 50%. Data integrity and consistency are also improved in scenarios like range movements.
  • CEP-16: Auth Plugin Support for CQLSH the change to the Cassandra command line tool, CQLSH, allows its authentication to source credentials from LDAP, Kerberos, and other stores.
  • CEP-9: Make SSLContext creation pluggable creates an extension point for 3rd party SSL/TLS providers, paving the way for ecosystem or custom built integrations.
  • Support for pre hashed passwords in CQL, eliminating plain text credentials. Learn more on the ASF Cassandra blog.
  • Improvements to nodetool, backup and restore, and GRANT/REVOKE/LIST statements have been added.      

Cassandra 4.1 also introduces new system tables that help with security and monitoring. ALL TABLES IN KEYSPACE allows you to grant permissions for all tables and user types in a keyspace, while preventing the user from using those permissions on the keyspace itself.  Also, system.top_partitions tracks top (Linux style!) partitions based on partition size or tombstone count per table. That data is exposed via JMX and nodetool tablestats.

Cassandra Developers

Sure, the Cassandra 4.1 release is primarily focused on the security and operations-minded, but that doesn’t mean there aren’t a few goodies for Cassandra and CQL developers.

  • The GROUP BY clause for CQL queries has been improved with the ability to group by time range.
  • Queries can now use CONTAINS and CONTAINS KEY conditions in conditional updates for LWTs, making queries such as this possible:

UPDATE mytable SET somefield = 4 WHERE pk = 'pkv' IF set_column CONTAINS 5

Great! If you’re still here, you’ve probably found something useful for you and your team.  

So let’s talk about upgrades. The two biggest areas of change from 4.0 to 4.1 are:

  • Configuration in cassandra.yaml
  • Upgrade to Paxos v2 

As part of CASSANDRA-15234 we standardized our cassandra.yaml parameters' names, preserving backward compatibility with the old names. Units of parameters for type data rate, data storage and duration were also liberated from their units while still keeping backward compatibility. There are new flags in regards to parameters overloading and improvements and bug fixes to tighten and document the Cassandra configuration. Please carefully review the details here, and double check your configuration on startup to ensure no corner case was missed. To do that, it’s recommended to check the logs for values loaded from cassandra.yaml as well as the Settings Virtual Table. This is because some properties have default null values which are changed programmatically later on, these changes are visible when you query the Settings Virtual Table. New configuration 4.1 parameters of type data rate, data storage, and duration were added in the new format only (numeric value + unit).

There’s a more complete list of considerations for 4.0 -> 4.1 upgrade (as well as Paxos v2 upgrade information for LWT users) in the Cassandra release notes on GitHub. Jump into a community forum if you are in doubt about anything!

The Cloud Native Future

Both the core Cassandra project and ecosystem are on the move. There are newly pluggable implementations released in 4.1, creating extension points for:

  • Memtable (persistent memory, tries memtables)
  • Network encryption
  • Authentication
  • Schema storage services (like etcd)
  • System wide guardrails 

The doors are open to powerful new ecosystem opportunities! In addition to projects like Stargate providing database-as-APIs, K8ssandra harmonizing Cassandra with Kubernetes, etcd, and exploratory work around a DynamoDB API for Stargate – the Cassandra ecosystem is thriving and growing.

Energy is already building for Apache Cassandra 5.0. At the top of the list are two landmark changes that are worth getting excited about: full ACID transactions and SAI / Storage Attached Indexes (relational-style secondary indexes).

CEP-15 General Purpose Transactions features groundbreaking work rooted in years of research by Apple Inc and the University of Michigan. It’s built on a next-generation distributed consensus protocol. This protocol, Accord, aligns with a leaderless architecture like Cassandra’s – in a manner that options like Spanner and Raft don’t. Limited to those options, there wasn’t a way for Cassandra to provide transactions with the linear scale and uptime guarantees that Cassandra users expect. Accord accomplishes consensus in one round trip and it doesn’t require complex leader failover mechanisms. In short, it enables Cassandra to fulfill its design promises while delivering full ACID transactions.

CEP-7: Storage Attached Indexes (relational-style secondary indexes) is a commercial feature that was only available on DataStax Astra DB and DataStax Enterprise. It’s being contributed back to the OSS community from DataStax. Relational-style secondary indexes are a powerful addition to your arsenal for any OLTP application. If you’re new to them in Cassandra, check out this excellent primer from Jeff Carpenter.

Taken together, there is no doubt that the pace of innovation in and around Cassandra is picking up, reborn in the cloud. Learn how to get involved in Apache open source on the web or in community discussions!

Want to learn more about Apache Cassandra? Register now for the Cassandra Summit, which takes place in San Jose, Calif., March 13-14, 2023. Use DataStax’s code, CS23DS20, to save 20% on your pass.

Cassandra in your future? Try the Apache Cassandra page

Resources

  1. Apache Cassandra
  2. Cassandra on StackOverflow
  3. DataStax Luna - Apache Cassandra® Support
  4. Cassandra 4.1 and beyond Webinar
  5. Cassandra: The Definitive Guide - 3rd Edition
  6. Stargate.io
  7. K8ssandra.io
  8. Astra DB
  9. Astra CDC
  10. Astra Streaming
  11. Join our Discord: Fellowship of the (Cassandra) Rings 
  12. DataStax Academy
  13. DataStax Certifications
  14. DataStax Workshops

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.