CompanyJanuary 26, 2016

DataStax Java Driver: 3.0.0 released!

DataStax Java Driver: 3.0.0 released!

It's finally here! The Java driver team is pleased to announce that the long-awaited 3.0.0 version has just been released.

Among various new features and improvements, 3.0.0 brings full compatibility with Cassandra 2.2 and 3.0 and a major feature: custom codecs.

With the switch to using semantic versioning, we seized the opportunity of this major release to clean up the API; as a consequence, version 3.0.0 is not binary compatible with older versions and has breaking changes – all of them documented in the upgrade guide (we strongly suggest reviewing it before upgrading the driver).

  1. Compatibility with Cassandra 2.2 and 3.0+
    1. Support for new CQL types
    2. Unset values
    3. Changes to Schema Metadata API
    4. Server Warnings
    5. New Exception Types
    6. Custom payloads
  2. Custom Codecs
    1. Optional Codecs
  3. Other Major Improvements
    1. RetryPolicy enhancements
    2. Named parameters in SimpleStatement
    3. Per-statement read timeouts
    4. Additions to the Host API
  4. Getting the driver

 

Compatibility with Cassandra 2.2 and 3.0+

Thanks to JAVA-572, the driver now fully supports the native protocol version 4, which comes with interesting additions:

Support for new CQL types

JAVA-404 and JAVA-786 brought support for four new CQL typesDATETIMESMALLINT and TINYINT.

Methods to set and retrieve such types have been added to all relevant classes (RowBoundStatementTupleValue and UDTValue):

  • getByte() / setByte(byte) for TINYINT;
  • getShort() / setShort(short) for SMALLINT;
  • getTime() / setTime(long) for TIME;
  • getDate() / setDate(LocalDate) for DATE.

Note that to remain consistent with CQL type names, the methods to retrieve and set TIMESTAMP values have been renamed to getTimestamp() and setTimestamp(). They were formerly named getDate() and setDate(), but these now represent the DATE type.

SMALLINT and TINYINT are respectively 16 and 8-bit integers, so their usage should be quite straightforward.

DATE represents a day with no corresponding time value; it is encoded as a 32-bit unsigned integer representing a number of days, with «the Epoch» (January 1st 1970) at the center of the range (231). TIME is the time of the day (with no specific date); it is encoded as a 64-bit signed integer representing the number of nanoseconds since midnight.

Here is a small example of how to use SMALLINT and TINYINT:

1

2

3

4

5

6

7

8

session.execute("CREATE TABLE IF NOT EXISTS small_ints(s smallint PRIMARY KEY, t tinyint)");

PreparedStatement pst = session.prepare("INSERT INTO small_ints (s, t) VALUES (:s, :t)");

session.execute(pst.bind(Short.MIN_VALUE, Byte.MAX_VALUE));

 

Row row = session.execute("SELECT * FROM small_ints").one();

 

short s = row.getShort("s");

byte t = row.getByte("t");

There is one minor catch: Java's integer literals default to int, which the driver serializes as CQLINTs. So the following will fail:

1

2

3

session.execute(pst.bind(1, 1));

// InvalidTypeException: Invalid type for value 0 of CQL type smallint,

// expecting class java.lang.Short but class java.lang.Integer provided

The workaround is simply to coerce your arguments to the correct type:

1

session.execute(pst.bind((short)1, (byte)1));

And here is a small example of all 3 CQL temporal types, TIMESTAMPDATE and TIME:

1

2

3

4

5

6

7

session.execute("CREATE TABLE IF NOT EXISTS dates(ts timestamp PRIMARY KEY, d date, t time)");

session.execute("INSERT INTO dates (ts, d, t) VALUES ('2015-01-28 11:47:58', '2015-01-28', '11:47:58')");

 

Row row = session.execute("SELECT * FROM dates").one();

Date ts = row.getTimestamp("ts");

LocalDate d = row.getDate("d");

long t = row.getTime("t");

As you see, TIMESTAMP is still mapped to java.util.Date, whereas TIME is mapped by the driver to primitive longs, representing the number of nanoseconds since midnight. As for DATE values, the driver encapsulates them in a new class, LocalDate. As it can be quite cumbersome to work with raw DATE literals (specially because Java doesn't have unsigned integers), the LocalDate class aims to hide all that complexity behind utility methods to convert LocalDate instances to and from integers representing the number of days since the Epoch.

Should the driver's default mappings for temporal types not suit your needs, we have good news: the new "extras" module – see below – contains alternative codecs to deal with DATE and TIME values:

And for those who prefer to keep it low-level and avoid the overhead of creating container classes:

  • SimpleDateCodec maps DATE to primitive ints representing the number of days since the Epoch; and
  • SimpleTimestampCodec maps TIMESTAMP to primitive longs representing milliseconds since the Epoch.

Unset values

For Protocol V3 or below, all variables in a statement must be bound. With Protocol V4, variables can be left "unset", in which case they will be ignored server-side (no tombstones will be generated). If you’re reusing a bound statement you can use the unset methods to unset variables that were previously set:

1

2

3

4

5

BoundStatement bound = ps1.bind().setString("foo", "bar");

// Unset by name

bound.unset("foo");

// Unset by index

bound.unset(0);

Note that this will not work under lower protocol versions; attempting to do so would result in an IllegalStateException urging you to explicitly set all values in your statement.

Changes to Schema Metadata API

As you probably already know, CASSANDRA-6717 has completely changed the way Cassandra internally stores information about schemas, while CASSANDRA-6477 introduced Materialized Views, and CASSANDRA-7395 introduced User-defined Functions and Aggregates. On top of that, secondary indexes have been deeply refactored by CASSANDRA-9459.

The driver now fully supports all these features and changes; let's see how.

Retrieving metadata on a materialized view is straightforward:

1

2

MaterializedViewMetadata mv = cluster.getMetadata()

    .getKeyspace("test").getMaterializedView("my_view");

Alternatively, you can obtain the view form its parent table:

1

2

3

4

5

TableMetadata table = cluster.getMetadata()

    .getKeyspace("test").getTable("my_table")

MaterializedViewMetadata mv = table.getView("my_view");

// You can also query all views in that table:

System.out.printf("Table %s has the following views: %s%n", table.getName(), table.getViews());

To illustrate the driver's support for user-defined functions and aggregates, let's consider the following example:

1

2

3

4

5

6

7

8

USE test;

CREATE FUNCTION plus(x int, y int)

            RETURNS NULL ON NULL INPUT

            RETURNS int LANGUAGE java AS 'return x + y;';

CREATE AGGREGATE sum(int)

            SFUNC plus

            STYPE int

            INITCOND 0;

To retrieve metadata on the function defined above:

1

2

3

4

5

FunctionMetadata plus = cluster.getMetadata()

    .getKeyspace(keyspace)

    .getFunction("plus", DataType.cint(), DataType.cint());

System.out.printf("Function %s has signature %s and body '%s'%n",

    plus.getSimpleName(), plus.getSignature(), plus.getBody());

To retrieve metadata on the aggregate defined above:

1

2

3

4

5

6

7

8

AggregateMetadata sum = cluster.getMetadata()

    .getKeyspace(keyspace)

    .getAggregate("sum", DataType.cint());

System.out.printf("%s is an aggregate that computes a result of type %s%n",

    sum.getSimpleName(), sum.getReturnType());

FunctionMetadata plus = sum.getStateFunc();

System.out.printf("%s is a function that operates on %s%n",

    plus.getSimpleName(), plus.getArguments());

Note that, in order to retrieve a function or aggregate from a keyspace, you need to specify its name and its argument types, to distinguish between overloaded versions.

The way to retrieve metadata on a secondary index has changed from 2.1: the former one-to-one relationship between a column and its (only) index has been replaced with a one-to-many relationship between a table and its many indexes. This is reflected in the driver's API by the new methods TableMetadata.getIndexes() and TableMetadata.getIndex(String name):

1

2

3

4

5

TableMetadata table = cluster.getMetadata()

        .getKeyspace("test")

        .getTable("my_table");

IndexMetadata index = table.getIndex("my_index");

System.out.printf("Table %s has index %s targeting %s%n", table.getName(), index.getName(), index.getTarget());

To retrieve the column an index operates on, you should now inspect the result of the getTarget() method:

1

2

3

IndexMetadata index = table.getIndex("my_index");

ColumnMetadata indexedColumn = table.getColumn(index.getTarget());

System.out.printf("Index %s operates on column %s%n", index.getName(), indexedColumn);

Beware however that the code above only works for built-in indexes where the index target is a single column name. If in doubt, make sure to read the upgrade guide should you need to migrate existing code.

Server Warnings

With Protocol V4, do not miss anymore the oracles emitted by Cassandra!

Joking aside, Cassandra can now send warnings along with the server response; these can include useful information such as batches being too large, too many tombstones being read, etc.. With the Java driver, you can retrieve them by simply inspecting the ExecutionInfo object:

1

2

ResultSet rs = session.execute(...);

List<String> warnings = rs.getExecutionInfo().getWarnings();

New Exception Types

New exception types have been added to handle additional server-side errors introduced in Cassandra 2.2: ReadFailureExceptionWriteFailureException and FunctionExecutionException.

Also, note that thanks to JAVA-1006, the whole exceptions hierarchy has been redesigned in this version.

Custom payloads

And finally, Custom payloads are generic key-value maps that can be sent alongside a query. They are used to convey additional metadata when you deploy a custom query handler on the server side.

Custom Codecs

JAVA-721 introduced an exciting new feature: custom codecs.

In short, where before the driver had a hard-coded set of mappings between CQL types and Java types, now it has a fully dynamic, pluggable and customizable mechanism of handling CQL-to-Java conversions.

With custom codecs, users can now define their own mappings, and the driver will use them wherever appropriate, seamlessly. The possibilities are endless: map CQL temporal types to Java 8 Time API or to Joda Time, as we already mentioned; provide transparent JSON-to-Java and XML-to-Java conversion; map CQL collections to Java arrays or – why not? – to Scala collections...

Just to give you a hint of how powerful this feature can be, imagine that you developed your own codec to convert JSON strings stored in Cassandra to Java objects; the code to retrieve your objects would be as simple as this:

1

2

3

4

5

6

7

8

// Roll your own codec

TypeCodec<MyPojo> myJsonCodec = ...;

// register it so the driver can use it

cluster.getConfiguration().getCodecRegistry().register(myJsonCodec);

// query some JSON data

Row row = session.execute("SELECT json FROM t WHERE pk = 'I CAN HAZ JSON'").one();

// Let the driver convert it for you...

MyPojo myPojo = row.get("json", MyPojo.class);

When designing new codecs for the Java driver, just make sure to read our online documentation as well as the javadocs for TypeCodec and CodecRegistry.

Optional Codecs

Custom codecs gave us the opportunity to introduce a new member in the Java driver family: the "extras" module.

This module has been created to host additions to the driver that, albeit useful, cannot make into the core API, mainly for backwards-compatibility reasons, or because they target a more specific audience (e.g. they require Java 8 or higher, while the driver must remain compatible with older versions of Java).

To use this new module in your own application, simply pull the following Maven dependency:

1

2

3

4

5

<dependency>

  <groupId>com.datastax.cassandra</groupId>

  <artifactId>cassandra-driver-extras</artifactId>

  <version>3.0.0</version>

</dependency>

To celebrate the event, we included in this version a rich set of codecs that we hope will be useful to many of you:

Check our online documentation for more details.

Note that the mapping framework has been retrofitted to use custom codecs when appropriate.  One of the consequences of that is that the @Enumerated annotation has gone, replaced with codecs from the extras module. Again, please read the upgrade guide for more details if you need to migrate existing code.

Other Major Improvements

RetryPolicy enhancements

Thanks to JAVA-819RetryPolicy has now a new method : onRequestError(). This method gives the user the ability to decide what to do in the following cases:

  1. On a client timeout, while waiting for the server response;
  2. On a connection error (socket closed, etc.);
  3. When the contacted host replies with an unusual error, such as IS_BOOTSTRAPPINGOVERLOADED, or SERVER_ERROR.

To distinguish among these error cases, one should inspect the DriverException that is passed to the method call. Here is a summary of the possible situations:

Error DriverException
Timeout (no server response) OperationTimedOutException
Network failure (socket closed, etc.) ConnectionException
IS_BOOTSTRAPPING BootstrappingException
OVERLOADED OverloadedException
SERVER_ERROR ServerError

Until now, the driver had a hardcoded behavior for all these cases: retry the query. But this behavior is actually dangerous if the query being executed is not idempotent; from now on, users can override the default behavior if they need to. And to make their lives even easier, the driver provides the new IdempotenceAwareRetryPolicy, that conveniently decorates any existing RetryPolicy with idempotence awareness, based on the idempotence flag; here is an example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Cluster cluster = Cluster.builder()

        .addContactPoints("127.0.0.1")

        // by default, statements will be considered non-idempotent

        .withQueryOptions(new QueryOptions().setDefaultIdempotence(false))

        // make your retry policy idempotence-aware

        .withRetryPolicy(new IdempotenceAwareRetryPolicy(DefaultRetryPolicy.INSTANCE))

        .build();

 

Session session = cluster.connect();

 

// by default, statements like this one will not be retried

session.execute("INSERT INTO table (pk, c1) VALUES (42, 'foo')");

 

// but this one will

session.execute(new SimpleStatement("SELECT c1 FROM table WHERE pk = 42").setIdempotent(true));

 

Named parameters in SimpleStatement

Thanks to JAVA-1037, one has now the ability to set named parameters on a SimpleStatement. Simply use the new constructor that takes a Map argument:

1

2

3

4

5

6

// Note the use of named parameters in the query

String query = "SELECT * FROM measures WHERE sensor_id=:sensor_id AND day=:day";

Map<String, Object> params = new HashMap<String, Object>();

params.put("sensor_id", 42);

params.put("day", "2016-01-28");

SimpleStatement statement = new SimpleStatement(query, params);

One caveat though: named parameters were introduced in Protocol V3, and thus require Cassandra 2.1 or higher. Check our online documentation on simple statements for more information.

Per-statement read timeouts

With JAVA-1033, you now have the possibility to specify read timeouts (i.e. the amount of time the driver will wait for a response before giving up) on a per-statement basis: the new method Statement.setReadTimeoutMillis() overrides the default per-host read timeout defined by SocketOptions.setReadTimeoutMillis().

This can be useful for statements that are granted longer timeouts server-side (for example, aggregation queries). Again, our online documentation on socket options has more about this.

Additions to the Host API

JAVA-1035 and JAVA-1042 have enriched the Host API with new useful methods:

  • getBroadcastAddress() returns the node's broadcast address. This corresponds to the broadcast_address setting in cassandra.yaml.
  • getListenAddress() returns the node's listen address. This corresponds to the listen_address setting in cassandra.yaml.
  • getDseVersion() returns the DSE version the host is running (when applicable).
  • getDseWorkload() returns the DSE workload the host is running (when applicable).

Note however that these methods are provided for informational purposes only; depending on the cluster version, on the cluster type (DSE or not), and on the host the information has been fetched from, they may return null at any time.

Getting the driver

As always, the driver is available from Maven and from our downloads server.

We're also running a platform and runtime survey to improve our testing infrastructure. Your feedback would be most appreciated.

Discover more
Releases
Share

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.