TechnologyJuly 20, 2020

Tutorial: Cassandra Migration to Astra DBaaS

Tutorial: Cassandra Migration to Astra DBaaS

When we launched Astra, the multi-cloud Database-as-a-Service (DBaaS) for Apache Cassandra® applications, we also open-sourced portions of the architecture and tooling backing it. Today, we continue our promise to lead with code. Your first Cassandra application migration to Astra is free, and knowing how to migrate additional Cassandra applications is certainly no secret. 

One-time Cassandra cluster migration

One-time migration of data from your own Cassandra cluster to Astra is the simplest way to migrate an application. (A Cassandra migration tool is currently available for early access. Sign up and try it out!)

In the meantime, we’ll guide you through the migration in four steps:

  1. Create your Keyspace:Table in the destination cluster and download the connection bundle.
  2. Locally, install Apache Spark and the new version of DataStax's Spark Cassandra Connector (open-source).
  3. Use the DataFrames API to push your data to the remote Keyspace: Table.
  4. Verify your data on the destination cluster.

Step 1: Create your Keyspace:Table

In this example, we’ll use a table from a popular workload in NoSQLBench in the source cluster. The table has the following structure:


CREATE TABLE baselines.iot (
    machine_id uuid,
    sensor_name text,
    time timestamp,
    data text,
    sensor_value double,
    station_id uuid,
    PRIMARY KEY ((machine_id, sensor_name), time)
) WITH CLUSTERING ORDER BY (time DESC);

To migrate data, create a corresponding table in the destination cluster in Astra, which could be done directly in Astra's UI, via the "CQL Console" tab. (I've created it with the name test.iot.)

To connect to a destination database, download your access credentials as a secure connect bundle for your Cassandra cluster created on DataStax Astra. (For me, it was stored as ~/secure-connect-test.zip.)

Step 2: Install Spark and Cassandra Connector

After the secure bundle is downloaded, start Spark with the Spark Cassandra Connector. Your Spark configuration could be as simple as a single Spark node in local master mode. I'm using Spark 2.4.5, compiled with Scala 2.11, so my command line looks like the following:


bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 \
  --files ~/secure-connect-test.zip

Step 3: Push your data to the remote Keyspace:Table

After the Spark Shell has started, a standard DataFrame API can be used to read the data from the source cluster. In my case, the contact point is 10.101.34.176. To write data to the destination cluster in Astra, provide a filename with the secure connect bundle, as well as the username and password:


val df = spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "baselines",
    "table" -> "iot",
    "spark.cassandra.connection.host" -> "10.101.34.176"
  ))
  .load

df.write
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "test",
    "table" -> "iot",
    "spark.cassandra.connection.config.cloud.path" -> "secure-connect-test.zip",
    "spark.cassandra.auth.password" -> "123456", 
    "spark.cassandra.auth.username" -> "test"
  ))
  .save

Step 4: Verify data

That's all! We can check that the data successfully copied by comparing the number of rows in both tables:


val srcCount = df.count
val destCount = spark.read.format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "test",
    "table" -> "iot",
    "spark.cassandra.connection.config.cloud.path" -> "secure-connect-test.zip",
    "spark.cassandra.auth.password" -> "123456", 
    "spark.cassandra.auth.username" -> "test"
  )).count 
println("source count" + srcCount)
println("destination count" + destCount)

Try this yourself by using Spark Cassandra Connector 2.5.1 with Apache Spark and easily migrate your existing database to DataStax Astra!

Alternative Option: Using the RDD API

Of course, we can also use the RDD API to perform a migration, but, as you can see, it's more verbose:


import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
val sourceConnection = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "10.101.34.176"))
val destConnection = CassandraConnector(sc.getConf
  .set("spark.cassandra.connection.config.cloud.path", "secure-connect-test.zip")
  .set("spark.cassandra.auth.username", "test")
  .set("spark.cassandra.auth.password", "123456"))

val rdd = {
  implicit val c = sourceConnection
  sc.cassandraTable("baselines","iot")
}

{
  implicit val c = destConnection
  rdd.saveToCassandra("test","iot")
}

Migrating data from Cassandra to Astra DBaaS is no mystery

We hope this simple, four-step tutorial provides solid ground for migrating your Cassandra data to Astra DBaaS. Things might get a little messier, if you decide to use the RDD API, but at least you know you have options. 

Of course, not all application migrations are easy. But, don’t worry. Our DataStax team is here to help. In fact, migration of your first Cassandra application is completely free. It’s on us with no catches.

Or, learn more about Astra DB.

Share

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.