CompanySeptember 19, 2017

New Features in the DataStax Node.js Drivers

New Features in the DataStax Node.js Drivers

Version 1.4.0 of the DataStax Enterprise Node.js Driver and version 3.3.0 of the DataStax Node.js Driver for Apache Cassandra are now available.

The main focus of these releases was to add support for speculative query executions. Additionally, we improved the performance of Murmur3 hashing and changed the query preparation logic along with other enhancements.

Speculative query executions

Speculative execution is a way to limit latency at high percentiles by preemptively starting one or more additional executions of the query against different nodes, that way the driver will yield the first response received while discarding the following ones.

Speculative executions are disabled by default. Speculative executions are controlled by an instance of SpeculativeExecutionPolicy provided when initializing the Client. This policy defines the threshold after which a new speculative execution is triggered.

The driver provides a ConstantSpeculativeExecutionPolicy that schedules a given number of speculative executions, separated by a fixed delay, the policy is exported under the {root}.policies.speculativeExecution submodule.

const client = new Client({

  contactPoints,

  policies: {

    speculativeExecution: new ConstantSpeculativeExecutionPolicy(

      200, // delay before a new execution is launched

      2) // maximum amount of additional executions

  }

});

Given the configuration above, an idempotent query would be handled this way:

  • Start the initial execution at t0
  • If no response has been received at t0 + 200 milliseconds, start a speculative execution on another node
  • if no response has been received at t0 + 400 milliseconds, start another speculative execution on a third node

As with the rest of policies in the driver, you can provide your own implementation by extending the SpeculativeExecutionPolicy prototype.

One important aspect to consider is whether queries are idempotent, (that is, whether they can be applied multiple times without changing the result beyond the initial application). If a query is not idempotent, the driver never schedules speculative executions for it, because there is no way to guarantee that only one node will apply the mutation. Examples of operations that are not idempotent are: counter increments/decrements; adding items to a list column; using non-idempotent CQL functions, like now() or uuid().

In the driver, query idempotence is determined by the isIdempotent flag in the QueryOptions, which defaults to false. You can set the default when initializing the Client or you can set it manually for each query, for example:

const query = 'SELECT * FROM users WHERE key = ?';

client.execute(query, [ 'usr1' ], { prepare: true, isIdempotent: true });

Note that enabling speculative executions causes the driver to send more individual requests, so throughput does not necessarily improve. You can read how speculative executions affect retries and other practical details in the documentation.

Improved Murmur3 hashing performance

Apache Cassandra uses Murmur3Partitioner to determine the distribution of the data across cluster partitions. The adapted version of the Murmur3 hashing algorithm used by Cassandra performs several 64-bit integer operations. As there isn't a native int64 representation in ECMAScript, previously we used to Google Closure's Long to support those operations.

To perform int64 add and multiply operations with int32 types requires you to use smaller int16 chunks to handle overflows. Google Closure's Long handles it by creating 4 uint16 chunks of each operand, performing the operations and creating a new int64 value (composed of 2 int32 values), as Long is immutable.

To improve the performance of the partitioner on Node.js, we created a custom type MutableLong that maintains 4 uint16 fields that are used to apply the operation, modifying the internal state, preventing additional allocations per operation.

Query preparation enhancements

Previously, the driver prepared the query only on the first node selected by the load-balancing policy, taking a lazy approach.

In this revision, we added fine tuning options on how the driver has to deal with query preparation, introducing 2 new options:

  • prepareOnAllHosts: That determines whether the driver should prepare the query on all hosts.
  • rePrepareOnUp: That when a node that has been down (unreachable) is considered back up, determines whether we should re-prepare all queries that have been prepared on other nodes.

Both properties are set to true by default. You can change it when creating the Client instance:

const client = new Client({

  contactPoints,

  prepareOnAllHosts: false,

  rePrepareOnUp: false

});

Expose connection pool state

The driver now provides a method to obtain a snapshot of the state of the pool per host. It provides the information of all hosts of the cluster, open connections per host and the amount of queries that are currently being executed (in-flight) through a given host.

You can check out the ClientState API docs for more information.

You can also use the string representation, that provides the information condensed in a readable format useful for debugging or periodic logging in production.

console.log('Pool state: %s', client.getState());

Wrapping up

More detailed information about all the features, improvements and fixes included in this release can be found in the changelogs: DSE driver changelog and Apache Cassandra driver changelog.

New version of the drivers are available on npm:

Your feedback is important to us and it influences what features we prioritize. To provide feedback use the following:

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.