Tips and Tricks for the DataStax Astra CLI
Sometimes, the process of building Generative AI (GenAI) applications requires developers to perform a few database administration tasks to help the development process. While these tasks aren’t necessarily difficult or tedious, they do tend to distract developers from working on code. Wouldn’t it be great if we could quickly perform some of these simple tasks from the command line?
Fortunately, DataStax offers a powerful command line interface (CLI) that provides easy access to many useful commands. These commands can be quickly executed from a terminal window, allowing developers to remain focused on their main task at-hand.
Installation and requirements
The Astra CLI can be installed on a Mac, Linux, or Windows workstation.
Linux and Windows
If we are using a Windows workstation, it is recommended to use the Windows Subsystem for Linux (WSL). For both platforms, the following command should install the Astra CLI:
curl -Ls "https://dtsx.io/get-astra-cli" | bash
Note: With this process we might have to install Java Runtime Environment (JRE) separately.
Mac
It's recommended to use Homebrew to manage the Astra CLI.
brew install datastax/astra-cli/astra-cli
Note: Some Macs may require additional security configuration for Astra CLI to run properly.
After installation, we will likely need to open a new terminal window for the Astra CLI executable to be present in our PATH
environment variable. For additional information and troubleshooting, see the Astra CLI documentation.
Setup and configuration
Once the Astra CLI is installed, it needs to be properly configured with an Astra DB token generated with the “Organization Administrator” role. When our token is ready, we can run the following command:
astra setup --token <our_token>
Useful commands
The syntax of the CLI requires the text “astra db
” to prefix all database commands. We will also need to indicate the command that we intend to run, followed by the name of the database to run it on:
astra db <command> <database_name>
Here we’ll demonstrate four commands to solve tasks which are frequently required during the development process.
List databases
Information about the databases that are available with the token used during setup (above) can be shown using the astra db list
command:
astra db list
If the setup step was successful, we should see output in our terminal similar to this:
+------------------+----------------------+-----------+-------+---+-----------+ | Name | id | Regions | Cloud | V | Status | +------------------+----------------------+-----------+-------+---+-----------+ | vectorDB | b0a99774-d031-4e3d-9 | us-east1 | gcp | ■ | ACTIVE | | react-miami-2024 | 900a0727-11a0-4e89-9 | us-east-2 | aws | ■ | HIBERNATED| +------------------+----------------------+-----------+-------+---+-----------+
Tip: Our Astra DB Data API endpoint can be assembled from this information:
https://<database_id>-<database_region>.apps.astra.datastax.com
Example:
https://b0a99774-d031-4e3d-9-us-east1.apps.astra.datastax.com
Resume a hibernated database
If the database is on a free plan and it hasn’t been used in a couple of days, it can go into hibernation. Resuming a hibernated database is a simple matter with the Astra CLI. If we look at our astra db list
output above, we can see that the react-miami-2024
database has been hibernated. To reactivate that database, use the astra db resume
command:
astra db resume react-miami-2024
Note: The database name is case-sensitive and must be spelled exactly as shown in the list output.
Describe a database
Perhaps we want to see the details behind our database? We can view those with the astra db describe command:
astra db describe react-miami-2024 +------------------+----------------------+ | Attribute | Value | +------------------+----------------------+ | Name | react-miami-2024 | | id | 900a0727-11a0-4e89-9 | | Cloud | AWS | | Regions | us-east-2 | | Status | ACTIVE | | Vector | Enabled | | Default Keyspace | default_keyspace | | Creation Time | 2024-04-18T01:52:44Z | | Keyspaces | [0] default_keyspace | | Regions | [0] us-east-2 | +------------------+----------------------+
Create a collection
Many GenAI projects start out with the creation of a vector-enabled data collection. If we wanted to do that outside of our code, we can also use the Astra CLI. For instance, we could create a new collection named “vector_data” to support 1536-dimensional vectors (with a cosine similarity metric) using the astra db create-collection
command:
astra db create-collection vectorDB --collection vector_data -d 1536 -m cosine
List collections in a database
Want to verify which collections are in our vectorDB
? We can do that using the astra db list-collections
command:
astra db list-collections vectorDB +--------------------------+-----------+-----------+ | Name | Dimension | Metric | +--------------------------+-----------------------+ | car_images | 512 | cosine | | collection_vector_openai | 1536 | cosine | | vector_data | 1536 | cosine | +--------------------------+-----------+-----------+
Delete a collection
Sometimes we might need to delete that collection as a part of the development process. If we want to remove a collection named “vector_data” from our vectorDB
, we can do this using the astra db delete-collection
command:
astra db delete-collection vectorDB --collection vector_data
Tip: If we decide to change our application’s large language model (LLM) and its vector embedding dimensions happen to be different from the previous model, we will need to DROP our collection and recreate it.
cqlsh
Sometimes we may need to work with the data and storage structures inside of Astra DB. The Astra CLI has a special tool for these tasks, known as the Cassandra Query Language (CQL) shell (cqlsh) tool. The cqlsh tool goes back to Astra DB’s roots in the Apache Cassandra® project, and allows users to execute CQL commands inside their databases.
The cqlsh tool can run CQL commands in-line using the -e
flag. It can also be run interactively. Examples demonstrating both approaches can be seen below.
Delete a collection’s data
Often we will need to delete the data within a collection, but might want to keep the collection itself. To do this, we can use the TRUNCATE command from within cqlsh:
astra db cqlsh vectorDB -e "TRUNCATE TABLE default_keyspace.vector_data;"
We could also run the cqlsh tool interactively to truncate the collection:
astra db cqlsh vectorDB [INFO] Cqlsh is starting, please wait for connection establishment... Connected to cndb at 127.0.0.1:9042. [cqlsh 6.8.0 | Cassandra 4.0.0.6816 | CQL spec 3.4.5 | Native protocol v4] Use HELP for help. token@cqlsh> USE default_keyspace; token@cqlsh:default_keyspace> DROP TABLE vector_data; token@cqlsh:default_keyspace> exit;
Tip: If we decide to use a different LLM in our app, we will need to TRUNCATE the data. This holds true even if the model creates vector embeddings with the same size dimensions.
Conclusion
At DataStax, we’re committed to providing the best possible developer experience. The Astra CLI gives us access to powerful tools, enabling us to quickly perform essential tasks while maintaining focus on building GenAI applications.
More details can be found in the Astra CLI documentation. This includes instructions and examples on everything from simple CRUD operations, to bulk-loading collections with millions of rows.
Be sure to check out the Astra CLI, and take your development process to the next level!