TechnologyJanuary 31, 2020

Moving from RDBMS to NoSQL: Migration Best Practices for Apache Cassandra®

Moving from RDBMS to NoSQL: Migration Best Practices for Apache Cassandra®

One of the most frequently asked questions we receive is “how do I migrate my application that was designed for a relational database to Cassandra?” This is a great question with all kinds of practical implications, so we’ve gathered some collective wisdom and best practices to share with you.

Knowing when it’s time to migrate away from a relational database

Any migration to a new platform or database is going to involve work. Before starting on a migration project, you’ll want to understand how you’ll benefit and when it’s the right time to move the key workloads that run your enterprise from a relational database to a NoSQL Cassandra database. 

Here are some of the key signs you should look for to help you know when your current relational database isn’t getting the job done anymore: 

  1. Your database queries are getting slower and more difficult to debug and maintain.
  2. You’re struggling to scale beyond a single database node, or paying high license costs for a fancy multi-node solution.
  3. Hot backups required by your disaster recovery plan waste resources and don’t guarantee high availability.
  4. You want to deploy applications in hybrid or multi-cloud architectures.

If you’ve concluded it’s time for a change, you’ll need to identify the use cases causing the most performance and scalability challenges, and begin prioritizing them for migration. Where possible, migrating functionality a bit at a time is generally lower risk than a “big bang” or “flip the switch” migration.

Migrating from RDBMS to Apache Cassandra: Step-by-Step

Once you’ve identified a use case or two to migrate, the migration process includes the following:

  1. Adapting your data model
  2. Adapting your application 
  3. Planning your deployment 
  4. Moving your data 

Let’s examine what’s involved in each of these steps.

Adapting your data model

It’s vital to understand that Cassandra data modeling is not the same as relational data modeling. While Cassandra uses familiar concepts like tables, rows, and columns, and the Cassandra Query Language (CQL) is quite similar to SQL, there are some important differences you need to be aware of.

Relational data modelers are accustomed to creating normalized schema in order to minimize data duplication, and using joins to assemble data from multiple tables. You might be able to speed up some slow queries by adding additional indexes or selectively denormalizing data by duplicating columns in tables to avoid joins.

Cassandra data modeling is different: denormalization is the rule, not the exception. You start with analyzing the workflows of your application to identify the queries you’ll need, and then design tables that contain all the required information in a single query. The DataStax Academy course “DS220 Practical Application Data Modeling with Apache Cassandra” is a great way to develop your expertise in designing Cassandra tables.

Make sure you don’t underestimate the importance of creating good Cassandra data models, as this will be your number one key to a successful migration. It’s probably a good idea to do some load testing on your data models to see how your write and read queries will perform with non-trivial amounts of data, and to get a more concrete idea of the cluster size and configuration that will help achieve your performance goals. You can use NoSQLBench to put a load on a target cluster for performance and scale testing, data model validation, and more.

Adapting your application 

The next step is updating your application code to write and read from the Cassandra tables you’ve designed. Whether you’re updating an existing monolith or creating entirely new microservices, there are DataStax Drivers available in the most popular languages that you can use to connect to your DataStax Enterprise or Cassandra clusters. 

New Cassandra developers need to become accustomed to the idea that Cassandra is a distributed database, so there are trade-offs to consider when storing multiple copies of data across multiple nodes or even multiple data centers or clouds. You’ll want to learn about Cassandra’s tunable consistency and the tools Cassandra gives you for managing the trade-offs between consistency and performance, including consistency levels, lightweight transactions, and batches. The DataStax Academy course “DS201 Foundations of Apache Cassandra” is a great introduction to these concepts. 

Planning your deployment

Before you deploy your updated application, it’s important to plan out your Cassandra cluster. You’ll want to consider questions such as:

  • What are the performance metrics or service level agreements (SLAs) that will be required for queries?
  • What kind of hardware and network is available within the platform?
  • How many data centers will the application be deployed to?

These questions should be considered both in terms of your initial deployment, as well as how you plan to expand as your application proves successful.

Moving your data 

After doing the hard work of designing Cassandra tables and writing the code to write and read from those tables, the time will come to deploy your updated application to production. In most cases you’ll have data to move from your legacy database. The DataStax Bulk Loader and other similar tools are great for performing one-time data migrations, one of several cases described in Brian Hess’s blog series

If you have requirements for a zero-downtime migration you’ll also want to investigate using Apache Kafka®, the Kafka Connect framework, and the DataStax Kafka Connector to capture changes from your legacy database and write them into Cassandra tables. To validate the results of your data migration, consider using Apache Spark to compare records from the source system to those in your Cassandra cluster.  

Pack your bags: It’s time to move to NoSQL

The thought of migrating from a relational database to a NoSQL database, like Cassandra, can be daunting. But, it’ll likely be easier than you think. That’s especially true if you’re proactively on the lookout for the signs the time to make a move might be approaching, while also actively searching for use cases where NoSQL would be a better choice at your company. 

Once you’ve decided to take the plunge, make sure to break the migration process down to a step-by-step process, like we outlined above, and you should be well on your way to benefiting from all Cassandra has to offer

Don’t worry. You won’t be alone. We have plenty of resources to help you on the journey, including:

  • DataStax Docs
  • DataStax Community: Learn from Apache Cassandra experts from DataStax and the larger community.
  • The “Migrating from SQL to NoSQL” section of our Cassandra migration and upgrade page.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.