Getting started with the DataStax C/C++ driver
The DataStax C/C++ driver is one of the newest members of the DataStax drivers family. It just recently had its first release candidate. Up to now, the focus of our work has been on matching feature parity with the other drivers as well as finalizing the API. The goal of this post is to provide some of that introductory documentation. More in-depth documentation can be found in the fully documented header file as well as examples provided with the driver. In addition to this post, we are currently working to include additional documentation as part of the final 1.0 release.
This post will not cover building the driver or setting up a Cassandra cluster. If you haven't built the driver before the instructions for doing so can be found in the README. We have near-term plans for making this process easier and providing binary releases for major platforms. Documentation for setting up a Cassandra cluster can found on our Documentation site or planetcassandra.com. Let's get started using the driver!
Configuring the driver
The cluster object
The first step to using the driver is to create a CassCluster
object that describes your Cassandra cluster's configuration. The default cluster object is good for most clusters and only a list of contact points needs to be configured. The list of contact points doesn't need to contain every host in your cluster, only a small subset is required, because the rest of the cluster will be automatically discovered through the control connection. It's a good idea to change the order of your contact points for each of your client hosts to prevent a single Cassandra host from becoming the control connection on every client machine in your cluster. The plan is to do this automatically in a future release. The control connection also monitors changes in your cluster's topology (automatically handling node outages, adding new nodes, and removal of old nodes) and tracks schema changes.
1 2 3 4 5 6 7 8 9 10 11 |
|
Other cluster settings
The cluster object can also be used to configure SSL, set authentication credentials, and tune driver performance. The full list and explanation of all the driver's cluster object settings can be found in the driver's header file.
Connecting a session and executing queries
The session object
The session object is used to execute queries. Internally, it also manages a pool of client connections to Cassandra and uses a load balancing policy to distribute requests across those connections. It's recommend that your application only creates a single session object per keyspace as a session object is designed to be created once, reused and shared by multiple application threads. The throughput of a session can be scaled by increasing the number of I/O threads. An I/O thread is used to handle reading and writing query request data to and from Cassandra. The number of I/O threads defaults to one per CPU core, but it can be configured using cass_cluster_set_num_threads_io()
. It's generally better to create a single session with more I/O threads than multiple sessions with a smaller number of I/O threads. More DataStax driver best practices can be found in this post.
Connecting a session
The C/C++ driver's API is designed so that no operation will force your application to block. Operations that would normally cause your application to block, such as connecting to a cluster or running a query, instead return a CassFuture
object that can be waited on, polled or used to register a callback. The API can also be used synchronously by immediately attempting to get the result from a future. To demonstrate the use of CassFuture
let's create and connect a CassSession
using the cluster object we created earlier.
1 2 3 4 5 6 7 8 9 10 11 |
|
In that example the future is waited on synchronously, it's also possible to asynchronously receive notification of the connection from a callback.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
It should be noted that the driver may run the callback on thread that's different from the application's calling thread. Any data accessed in the callback must be immutable or synchronized with a mutex, semaphore, etc. A full example using callbacks can be found here.
Running queries
The connected session can now be used to run queries. Queries are constructed using CassStatement
objects. There are two types of statement objects, regular and prepared. Regular statements are most useful for ad hoc queries and applications where the query string will change often. A prepared statement caches the query on the Cassandra server and requires the extra step of preparing the query server-side first.
CassStatement
objects can also be used to bind variables. The '?' marker is used to denote the bind variables in a query string. In addition to adding the bind marker to your query string your application must also provide the number of bind variables to cass_statement_new()
when constructing a new statement. If a query doesn't require any bind variables then 0
can be used. cass_statement_bind_*()
functions are then used to bind values to the statement's variables. Bind variables can be bound by the marker's position (index) or by name. Variables can only be bound by name for prepared statements (see the prepared statement example below). This limitation exists because query metadata provided by Cassandra is required to map the variable name to the variable's marker index.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Prepared statements
A prepared statement should be used to improve the performance of frequently executed queries. Preparing the query caches it on the Cassandra nodes and only needs to be done once. Once created, prepared statements should be reused with different bind variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Notice that this example also uses the cass_statement_bind_*_byname()
functions instead of binding by index. Also, the CassPrepared
object is immutable and can be used to prepare statements on multiple threads concurrently.
Handling results
Before, when inserting a new row the future object didn't have any meaningful result other than error code. Now that data has been inserted into the "examples" table we can use a SELECT
statement to retrieve the results. The code to do this looks similar to the INSERT
example except now a CassResult
object can be retrieved from the queries' future object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
In this example, only a single row is retrieved from Cassandra so the convenience function cass_result_first_row()
can be used to get the first and only row. If multiple rows are returned a CassIterator
object can be used to iterate over the returned rows (see the example below). Column values, of type const CassValue*
, are then retrieved from the row using either cass_row_get_column()
or cass_row_get_column_by_name()
.
Values such as CassString
and CassBytes
point to memory held by the result object. The lifetimes of those values are valid as long as the result object isn't freed. These values need to be copied into application memory if they need to live longer than the result object's lifetime. Primitive types such as cass_int32_t
are copied by the driver because it can be done cheaply without incurring extra allocations.
The returned result object can be read and iterated on by multiple threads concurrently because the iterator object itself contains the position state allowing the result object to remain immutable.
Iterators
The queries in the previous examples returned a single row result, but queries often return many rows. An iterator object is used to access all the rows of a result.
1 2 3 4 5 6 7 8 9 |
|
Code inside the iteration loop should make a copy of the row values (or process them immediately) because cass_iterator_next()
invalidates the previous row returned by cass_iterator_get_row()
. In addition to iterating a result with multiple rows, there are iterators that can be used to iterator over columns and collections. The column and collection iterators have a very similar API and the same semantics as shown in the row iterator example.
Paging
Large result sets can be divided into multiple pages automatically using the driver's paging API. To do this the result object keeps track of the pagination state for the sequence of paging queries. When paging through the result set the result object is checked to see if more pages exist and then attached to the statement before re-executing the query to get the next page.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
A more complete example of paging can be found here.
Batches
Batches can be used to group multiple mutations (UPDATE, INSERT, DELETE) together into a single statement. CASS_BATCH_TYPE_LOGGED
can be used to make sure that multiple mutations across multiple partitions happen atomically, that is, all the included mutations will eventually succeed. However, there is some overhead associated with using logged batches in Cassandra. Batches can also be used to group mutations for a single partition key by setting CASS_BATCH_TYPE_UNLOGGED
and for counters via CASS_BATCH_TYPE_COUNTER
. In the case with unlogged batches, it should NOT be used as a performance optimization. More information on the use cases of batch statements can be found in this excellent post. Here's how to use batches:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
A full example using batches can be found here.
Additional resources
This post covered the basic functionality provided by the DataStax C/C++ driver with the goal of helping you to get started. More in-depth API documentation and example code be found in the C/C++ driver's GitHub repository. In addition to this, we are working on substantially improving the formal documentation for the C/C++ driver over the next few releases. If you need help or have questions please use the mailing list or IRC.
- Mailing List: https://groups.google.com/a/lists.datastax.com/forum/#!forum/cpp-driver-user
- IRC: #datastax-drivers on irc.freenode.net
- Review and contribute source code: https://github.com/datastax/cpp-driver
- Report issues on JIRA: https://datastax-oss.atlassian.net/browse/CPP