TechnologyJanuary 31, 2012

Working with Apache Cassandra on Mac OS X

Working with Apache Cassandra on Mac OS X

If you use Mac OS X as your platform for development work, then you may be interested to know how easy it is to use Apache Cassandra on the Mac. The following shows you how to download and setup Cassandra, its utilities, and also use DataStax OpsCenter, which is a browser-based, visual management and monitoring tool for Cassandra.

Download the Software

DataStax makes available the DataStax Community Edition, which contains the latest community version of Apache Cassandra, along with the Cassandra Query Language (CQL) utility, and a free edition of DataStax OpsCenter. To get Datastax Community Edition, go to Planet Cassandra and download both Cassandra and OpsCenter, and select the tar downloads of both the DataStax Community Server and OpsCenter. You can also use the curl command on Mac to directly download the files to your machine. For example, to download the DataStax Community Server, you could enter the following at terminal prompt: curl -OL http://downloads.datastax.com/community/dsc.tar.gz

Install Cassandra

Once your download of Cassandra finishes, move the file to whatever directory you’d like to use for testing Cassandra. Then uncompress the file (whose name will change depending on the version you’re downloading):

tar -xzf dsc-cassandra-1.2.2-bin.tar.gz

Then switch to the new Cassandra bin directory and start up Cassandra:

robinsmac:dev robin$ cd dsc-cassandra-1.2.2/bin
robinsmac:bin robin$ sudo ./cassandra
robinsmac:bin robin$  INFO 14:49:57,739 Logging initialized
INFO 14:49:57,750 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_35
INFO 14:49:57,750 Heap size: 2093809664/2093809664
INFO 14:49:57,751 Classpath:
.
.
INFO 14:49:59,208 Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-ib-2-Data.db (210 bytes) for commitlog position ReplayPosition(segmentId=1362167398602, position=53130)

Now that you have Cassandra running, the next thing to do is connect to the server and begin creating database objects. This is done with the Cassandra Query Language (CQL) utility. CQL is a very SQL-like language that lets you create objects as you’re likely used to doing in the RDBMS world. The CQL utility (cqlsh) is in the same bin directory as the cassandra executable:

robinsmac:bin robin$ ./cqlsh
Connected to Test Cluster at localhost:9160.

[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]

Use HELP for help.
cqlsh>

Cassandra has the concept of a keyspace, which is similar to a database in a RDBMS. A keyspace holds data objects and is the level where you specify options for a data partitioning and replication strategy. For this brief introduction, we’ll just create a basic keyspace to hold some example data objects we’ll create:

cqlsh> create keyspace dev
... with replication = {'class':'SimpleStrategy','replication_factor':1};

Now that you have a keyspace created, it’s time to create a data object to store data. Because Cassandra is based on Google Bigtable, you’ll use column families /tables to store data. Tables in Cassandra are similar to RDBMS tables, but are much more flexible and dynamic. Cassandra tables have rows like RDBMS tables, but they are a sparse column type of object, meaning that rows in a column family can have different columns depending on the data you want to store for a particular row. Let’s create a base table to hold employee data:

cqlsh> use dev;
cqlsh:dev> create table emp (empid int primary key,
... emp_first varchar, emp_last varchar, emp_dept varchar);
cqlsh:dev>

The column family is named emp and contains four columns, including the employee ID, which acts as the primary key of the table. Note that a column family must have a primary key that’s used for initial query activity. Let’s now go ahead and insert data into our new column family using the CQL INSERT command:

cqlsh:dev> insert into emp (empid, emp_first, emp_last, emp_dept)
... values (1,'fred','smith','eng');

Notice how Cassandra’s CQL is literally identical to the RDBMS INSERT command. Other DML statements are as well:

cqlsh:dev> update emp set emp_dept = 'fin' where empid = 1;

Querying data uses the familiar SELECT statement:

cqlsh:dev> select * from emp;
empid | emp_dept | emp_first | emp_last
------+----------+-----------+----------
1     |      fin |      fred |    smith

However, look what happens when you try to use a WHERE predicate and reference a non-primary key column:

cqlsh:dev> select * from emp where empid = 1;
empid | emp_dept | emp_first | emp_last
------+----------+-----------+----------
1     |      fin |      fred |    smith
cqlsh:dev> select * from emp where emp_dept = 'fin';
Bad Request: No indexed columns present in by-columns clause with Equal operator

In Cassandra, if you want to query columns other than the primary key, you need to create a secondary index on them:

cqlsh:dev> create index idx_dept on emp(emp_dept);
cqlsh:dev> select * from emp where emp_dept = 'fin';
empid | emp_dept | emp_first | emp_last
------+----------+-----------+----------
1     |      fin |      fred |    smith

Installing and using DataStax OpsCenter

Installing DataStax OpsCenter on Mac involves working through the following steps in a terminal window:

  1. Untar the package (tar –xzf) in the directory you want to use for OpsCenter.
  2. Change directories to the OpsCenter home bin directory, and run the ./setup.py script.
  3. You can now start the primary OpsCenter process in the background by entering the command ./opscenter & from the bin directory.
  4. Now you need to get the agent configured to monitor the Cassandra instance you likely already have running on your Mac. Change to the agent/bin directory and run the setup script passing the localhost IP (usually 127.0.0.1) twice: ./setup 127.0.0.1 127.0.0.1.
  5. Start the agent from the agent/bin directory: ./datastax-agent.
  6. Open either a Firefox, Chrome, or Safari web browser and enter the following in the address bar: http://127.0.0.1:8888/opscenter/index.html.

dsc osx dsc osx

Conclusion

That’s it – you’ve now got Cassandra and DataStax OpsCenter installed and running on your Mac. For other software such as various application drivers and client libraries, visit the DataStax downloads page.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.