TechnologyJanuary 14, 2021

Simplifying Backup and Restore in DataStax Enterprise

Simplifying Backup and Restore in DataStax Enterprise

How backups are done in this cloud-native world needs to change. There are complex challenges to overcome when making these changes to distributed, cloud-native databases like DataStax Enterprise (DSE). One of the key challenges is: how do you backup a consistent view of an eventually consistent write-anywhere NoSQL database? And how do you enable users to do this easily without impacting performance? Well, we’ve made progress with a solution in our release of DSE 6.8.9! 

DataStax is excited to release the future of backup and restore functionality in DataStax Enterprise through the DSE Backup and Restore Service feature. With this new feature, operators and administrators can rejoice that backups and restores can all be done using the CQL API. 

Let’s dive right into how the new service works. We are going to go over how to configure a backup store, take a backup, and then restore the data from that backup. Before going through these steps, make sure that you have enabled the backup service as described here.

1. A new built in role: dse_backup_operator with all the required permissions has been created. This role can be granted to any user that needs to perform the backup operations.

GRANT dse_backup_operator TO database_manager

2. Next, we are going to set up where we want to backup our data. The backup service supports different types of storage for the backup files: file systems, S3 buckets and Google cloud storage. To be able to create a backup, a user needs to define a store for that backup. A store can be created and modified using CQL statements.

CREATE BACKUP STORE fs_store
USING 'FSBlobStore' WITH settings = {'path':'/path/to/data', 
'retention_time':'1w', 'retention_number':'3'};

In this example, we are specifying the location of our backups to be stored in the local filesystem. Since this is a distributed system, this location must be present on all nodes. The backups will be retained for 1 week and the number of backups to be retained within that period will be at least 3 copies. This means that if two copies were made within the last week, and two were made the week prior. The three most recent (one from last week) will be kept. This is so that there are at least 3 copies. Configurations for adding different stores and authentication can be found here.

3. After a store has been created, verify that it works using the following command:

VERIFY BACKUP STORE fs_store;

4. Now we are going to configure the backup service to take snapshots of a specified keyspace using a backup configuration. A backup configuration is used to define which keyspaces and tables should be backed up as well as the schedule of those backups and the location where they should be stored.

CREATE BACKUP CONFIGURATION my_conf OF keyspace1
TO STORE s3_store, fs_store
WITH frequency = '* 30 * * *' AND enabled = true;

This will take backups of keyspace1 in the S3 store and local file system every 30 minutes (cron expression used here). It’s important to note a couple restrictions with the backup configurations: 

  • A backup configuration can only target one full keyspace. This means no specific tables can be targeted.
  • There can't be two configurations targeting the same keyspace and using the same store.

5. We are now going to go over how to execute a manual backup operation. The following statement will manually trigger a backup job. The coordinator node that receives the statement will first check that all the nodes are running and only, in that case, it will forward the command to them. Each node will then run the job. 

RUN BACKUP my_conf;

Here, we are specifying the name of backup configuration (my_conf, from above) that we would like to run. This command will output an id assigned to the backup. If you’d like to track the status of the backup, you could run the following command and see if it succeeded or failed.

LIST BACKUPS FROM KEYSPACE my_keyspace;

6. Lastly, this is how we are going to recover our data from a particular backup we took in the fs_dest location.

RESTORE keyspace1 FROM BACKUP my_conf-20210116163054 FROM STORE
fs_dest;

We only went over a subset of the functionality of this service. To learn more about it and explore some of the other commands, head on over to our documentation here. In a few steps, we configured a storage location, created a routine backup, ran a manual backup, and restored our data. This was all done using a straightforward CQL API. As you can see, with the Cassandra Backup and Restore Service, backing up and restoring data for either disaster recovery or development purposes, like debugging in a new cluster, is easy and reliable.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.