TechnologyMay 7, 2024

Tips and Tricks for the DataStax Astra CLI

Tips and Tricks for the DataStax Astra CLI

Sometimes, the process of building Generative AI (GenAI) applications requires developers to perform a few database administration tasks to help the development process. While these tasks aren’t necessarily difficult or tedious, they do tend to distract developers from working on code. Wouldn’t it be great if we could quickly perform some of these simple tasks from the command line?

Fortunately, DataStax offers a powerful command line interface (CLI) that provides easy access to many useful commands. These commands can be quickly executed from a terminal window, allowing developers to remain focused on their main task at-hand.

Installation and requirements

The Astra CLI can be installed on a Mac, Linux, or Windows workstation.

Linux and Windows

If we are using a Windows workstation, it is recommended to use the Windows Subsystem for Linux (WSL). For both platforms, the following command should install the Astra CLI:

curl -Ls "https://dtsx.io/get-astra-cli" | bash

Note: With this process we might have to install Java Runtime Environment (JRE) separately.

Mac

It's recommended to use Homebrew to manage the Astra CLI.

brew install datastax/astra-cli/astra-cli

Note: Some Macs may require additional security configuration for Astra CLI to run properly.

After installation, we will likely need to open a new terminal window for the Astra CLI executable to be present in our PATH environment variable. For additional information and troubleshooting, see the Astra CLI documentation.

Setup and configuration

Once the Astra CLI is installed, it needs to be properly configured with an Astra DB token generated with the “Organization Administrator” role. When our token is ready, we can run the following command:

astra setup --token <our_token>

Useful commands

The syntax of the CLI requires the text “astra db” to prefix all database commands. We will also need to indicate the command that we intend to run, followed by the name of the database to run it on:

astra db <command> <database_name>

Here we’ll demonstrate four commands to solve tasks which are frequently required during the development process.

List databases

Information about the databases that are available with the token used during setup (above) can be shown using the astra db list command:

astra db list

If the setup step was successful, we should see output in our terminal similar to this:

+------------------+----------------------+-----------+-------+---+-----------+
| Name             | id                   | Regions   | Cloud | V | Status    |
+------------------+----------------------+-----------+-------+---+-----------+
| vectorDB         | b0a99774-d031-4e3d-9 | us-east1  | gcp   | ■ | ACTIVE    |
| react-miami-2024 | 900a0727-11a0-4e89-9 | us-east-2 | aws   | ■ | HIBERNATED|
+------------------+----------------------+-----------+-------+---+-----------+

Tip: Our Astra DB Data API endpoint can be assembled from this information:

https://<database_id>-<database_region>.apps.astra.datastax.com

Example:

https://b0a99774-d031-4e3d-9-us-east1.apps.astra.datastax.com

Resume a hibernated database

If the database is on a free plan and it hasn’t been used in a couple of days, it can go into hibernation. Resuming a hibernated database is a simple matter with the Astra CLI. If we look at our astra db list output above, we can see that the react-miami-2024 database has been hibernated. To reactivate that database, use the astra db resume command:

astra db resume react-miami-2024

Note: The database name is case-sensitive and must be spelled exactly as shown in the list output.

Describe a database

Perhaps we want to see the details behind our database? We can view those with the astra db describe command:

astra db describe react-miami-2024
+------------------+----------------------+
| Attribute        | Value                |
+------------------+----------------------+
| Name             | react-miami-2024     |
| id               | 900a0727-11a0-4e89-9 |
| Cloud            | AWS                  |
| Regions          | us-east-2            |
| Status           | ACTIVE               |
| Vector           | Enabled              |
| Default Keyspace | default_keyspace     |
| Creation Time    | 2024-04-18T01:52:44Z |
| Keyspaces        | [0] default_keyspace |
| Regions          | [0] us-east-2        |
+------------------+----------------------+

Create a collection

Many GenAI projects start out with the creation of a vector-enabled data collection. If we wanted to do that outside of our code, we can also use the Astra CLI. For instance, we could create a new collection named “vector_data” to support 1536-dimensional vectors (with a cosine similarity metric) using the astra db create-collection command: 

astra db create-collection vectorDB --collection vector_data -d 1536 -m cosine

List collections in a database

Want to verify which collections are in our vectorDB? We can do that using the astra db list-collections command:

astra db list-collections vectorDB
+--------------------------+-----------+-----------+
| Name                     | Dimension | Metric    | 
+--------------------------+-----------------------+
| car_images               | 512       | cosine    |
| collection_vector_openai | 1536      | cosine    |
| vector_data              | 1536      | cosine    |
+--------------------------+-----------+-----------+

Delete a collection

Sometimes we might need to delete that collection as a part of the development process. If we want to remove a collection named “vector_data” from our vectorDB, we can do this using the astra db delete-collection command:

astra db delete-collection vectorDB --collection vector_data

Tip: If we decide to change our application’s large language model (LLM) and its vector embedding dimensions happen to be different from the previous model, we will need to DROP our collection and recreate it.

cqlsh

Sometimes we may need to work with the data and storage structures inside of Astra DB. The Astra CLI has a special tool for these tasks, known as the Cassandra Query Language (CQL) shell (cqlsh) tool. The cqlsh tool goes back to Astra DB’s roots in the Apache Cassandra® project, and allows users to execute CQL commands inside their databases.

The cqlsh tool can run CQL commands in-line using the -e flag. It can also be run interactively. Examples demonstrating both approaches can be seen below.

Delete a collection’s data

Often we will need to delete the data within a collection, but might want to keep the collection itself. To do this, we can use the TRUNCATE command from within cqlsh:

astra db cqlsh vectorDB -e "TRUNCATE TABLE default_keyspace.vector_data;"

We could also run the cqlsh tool interactively to truncate the collection:

astra db cqlsh vectorDB

[INFO]  Cqlsh is starting, please wait for connection establishment...
Connected to cndb at 127.0.0.1:9042.
[cqlsh 6.8.0 | Cassandra 4.0.0.6816 | CQL spec 3.4.5 | Native protocol v4]

Use HELP for help.
token@cqlsh> USE default_keyspace;
token@cqlsh:default_keyspace> DROP TABLE vector_data;
token@cqlsh:default_keyspace> exit;

Tip: If we decide to use a different LLM in our app, we will need to TRUNCATE the data. This holds true even if the model creates vector embeddings with the same size dimensions.

Conclusion

At DataStax, we’re committed to providing the best possible developer experience. The Astra CLI gives us access to powerful tools, enabling us to quickly perform essential tasks while maintaining focus on building GenAI applications.

More details can be found in the Astra CLI documentation. This includes instructions and examples on everything from simple CRUD operations, to bulk-loading collections with millions of rows.

Be sure to check out the Astra CLI, and take your development process to the next level!

Discover more
DataStax Astra DB
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.