CompanyMay 14, 2021

Adelphi: Upgrading Cassandra With Confidence

Adelphi: Upgrading Cassandra With Confidence

DataStax is creating tooling to make it easier than ever to upgrade your Apache Cassandra® cluster safely. The Adelphi project provides tools to validate your current schema before you upgrade. In addition, these tools allow you to compare the performance of your schema between the two versions of Cassandra. This post introduces Adelphi and explains how it can help with the next upgrade to your cluster.

Schema matters

When a Cassandra administrator is considering upgrading to a new version of Cassandra, one of the first questions that gets asked is:

 “Will my schema work with the new version?”

Cassandra doesn’t usually introduce a large number of schema changes between versions but change does happen. And most administrators (understandably) aren’t enthusiastic about upgrading a large cluster just to validate that everything continues working as expected.

Or consider a slightly different case. A developer is working on a new feature for Cassandra and wants to confirm that this new feature maintains some notion of backward compatibility with prior versions. Perhaps some performance degradation is acceptable for this new feature—but how do we measure any such change? The developer eventually settles on creating a test case and manually running it against both versions, but this solution is cumbersome and error-prone.

Note that both the administrator and developer are interested in the same pair of criteria:

  • Correctness Does the schema work on the new version of Cassandra in a way that’s compatible with (if not identical to) its behavior on the old version?
  • Performance Do applications using the schema exhibit similar behavior on the new version of Cassandra? If there is a change in performance, how large is the change? Is the degree of change acceptable to our business needs?

Enter Adelphi

Adelphi is an open-source project developed by DataStax that attempts to help answer the questions posed above. Using Adelphi you can automate the creation of two Cassandra instances, one for each version, which are then subjected to a set of automated tests to validate that both instances contain identical data. The performance of these operations is also monitored; armed with this data, users can evaluate whether the new version performs adequately for their constraints. Adelphi consists of two distinct components:

  • A workflow orchestrated within a Kubernetes cluster responsible for instantiating the Cassandra instances and executing the various tests.
  • Supporting tooling that generates needed configuration files and performs other utility functions.

We will look at each of these components separately.

Reliability at the Helm

The heart of Adelphi is an Argo workflow that creates the necessary Cassandra instances and executes all of the operations we use to evaluate the behavior of the new version. These operations include:

  • The creation of new Cassandra clusters for the source and target version using the DataStax Kubernetes Operator for Apache Cassandra.
  • Creating the schema to be tested on each Cassandra instance once they have started and are available.
  • Executing NoSQLBench against each instance to populate data, gathering performance information while doing so.
  • Validating the consistency of the data generated by NoSQLBench using cassandra-diff.
  • Using Gemini to validate data integrity between versions by comparing the state of each server after a series of random mutations.
  • Starting a web server to make the output of the various tools above available for download.

The workflow is available via Helm for easy integration into whatever Kubernetes environment you might use. Any version of Cassandra supported by the DataStax Kubernetes Operator is available for the source or target clusters and can be specified via a Helm configuration file or at runtime via a command-line argument. For additional information on these (and many other) details, please take a look at our Getting Started Guide!

Don’t forget your tools

The other significant component of Adelphi is a Python package that includes a tool to automate common tasks. This tool (which is also named “adelphi”) is capable of generating both Gemini and NoSQLBench configuration files from a running Cassandra instance. In fact, the Argo workflow mentioned above uses this tool internally to do just that! But the tool can do more than generate configs; it also automates the process of contributing your schema to Adelphi.

In addition to what we've already covered here, a final goal of Adelphi is to serve as a public repository for real-world schemas that operate in real-world environments. The project has created a repository where users may contribute their schemas. Because this is a public repository, we require that any contributed schema be anonymized. This process renames keyspaces, tables, columns, and other schema structures so that no potentially sensitive information is revealed. 

The “adelphi” tool automates this process by extracting an anonymized schema and creating a pull request for the anonymized content on the schema repository. With this tool in hand, users can feel comfortable about contributing their schema to Adelphi so that the entire Cassandra community can study examples of real-world schemas. Our hope is that over time this repository will become a resource that will help all Cassandra users develop and test their applications!

Where we’re headed

Adelphi has come a long way but there’s still more work to do! We’re considering ideas for improving the presentation of the results of the various tools executed by the workflow. We want to make these results simple to access and clear to understand. We also would like to increase the number of schemas contributed to our repository. To help with this goal, we’re striving to make schema contribution even easier than it is now.

Does this work sound interesting? Adelphi is an open-source project and contributors are always welcome, so if you’d like to help us improve the project, feel free to come aboard!

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.