TechnologyJuly 20, 2015

Python Driver 2.6.0 with Cassandra 2.2 Features

Python Driver 2.6.0 with Cassandra 2.2 Features

Today we are happy to announce the release of the DataStax Python Driver 2.6.0 for Apache Cassandra, which includes support for the new features in Cassandra 2.2 and native protocol v4, and also some other general improvements. A full list of changes can be found in the CHANGELOG.

In this post I'll start with describing the general improvements, and proceed with features specific to the recent Cassandra 2.2 release.

Token-Aware Default Load Balancing Policy

This release includes a change to the default load balancing policy used by the driver. Load balancing policies are used to plan the order in which nodes are attempted for each query. The new default uses a nested policy which is both token- and data-center-aware. Token awareness allows it to route requests directly to nodes holding a replica of the data (possibly avoiding an extra hop). Data-center-awareness makes the driver consider nodes from a local DC before any others.

By default, the 'local' DC is chosen from the Cluster contact_points, so contact points should be for nodes local to the client instance. If you specify contact points from more than one DC, you will need to specify the local DC by initializing the policy explicitly via your Cluster load_balancing_policy.

These are not new policies, just new defaults. The change was made to make use of more advanced features out-of-the-box, and to bring this default in line with our other drivers. If you already specify a load balancing policy explicitly, this change will have no effect.

New Default Protocol Version, Automatic Downgrading

The default protocol version is now 4 (previously 2). This is done to avoid confusion when using the default with newer protocol features.

Along with this, the driver now supports downgrading protocol versions when connecting to older versions of Cassandra. This protocol downgrade only happens during the initial cluster connect, when the control connection is being established. This is mostly a convenience feature to allow using driver defaults for any Cassandra version. Production applications should set the protocol version explicitly to the version supported by their cluster. This is more efficient (avoiding protocol downgrades), and will also avoid degraded states if the client ever connects to a partially-upgraded cluster supporting mixed versions.

Connect Timeout Configuration

There was previously no easy way to set the timeout for making new connections. This release includes a new Cluster configuration parameter connect_timeout.

The default timeout is 5 seconds. It covers not only TCP establishment, but also startup negotiation like options exchange, protocol negotiation, and authentication.

Cluster Schema Refresh API Updates

Cluster.refresh_schema and Cluster.submit_schema_refresh are now deprecated. As new schema elements beyond keyspace and table (user type, function, aggregate), the API was becoming unwieldy. Rather than continue to explain and enforce various combinations of optional parameters, this API was deprecated in favor of dedicated calls for each schema entity.

Now, Cluster.refresh_schema_metadata is used to refresh everything from the database. Other entities are refreshed using one of the methods:

  • refresh_keyspace_metadata
  • refresh_table_metadata
  • refresh_user_type_metadata
  • refresh_user_function_metadata
  • refresh_user_aggregate_metadata

The driver still refreshes these entities automatically based on schema change events from the server. These functions are useful when the driver is configured to ignore those events, and refresh is done ad hoc by the application.

Distinguish Between NULL and UNSET Values

Cassandra 2.2 adds the ability to distinguish between null and unset parameters in native protocol v4. This represents a major improvement as it allows binding any combination of parameters in a prepared statement (as you'd expect, partition key columns are still required).

With previous versions of the protocol, when using a prepared statement you had to bind all its parameters or get an error. Combined with the fact that inserting null values resulted in the creation of tombstones, this could have led to larger numbers of prepared statements needed in an application.

When using protocol v4+, the driver will now implicitly set missing values to unset (as long as missing values are not part of the partition key). Applications can also explicitly provide unset values using cassandra.query.UNSET_VALUE.

For example, using positional binding:

1

2

3

4

from cassandra.query import UNSET_VALUE

ps = session.prepare('INSERT INTO test (key, v0, v1) VALUES (?, ?, ?)')

session.execute(ps, (0, 1))  # v1 implicitly unset

session.execute(ps, (0, UNSET_VALUE, 2))  # v0 explicitly unset

Please note that when using an earlier protocol version, the driver will revert this behavior and unspecified parameters will result in an error.

Client Warnings from the Server

Cassandra 2.2 adds client warnings to native protocol v4, as a way to surface warnings to clients that may not have access to server logs. Examples include hitting thresholds like batch_size_warn_threshold_in_kb and tombstone_warn_threshold while executing a client request.

Any warnings received by the driver are unconditionally logged via cassandra.protocol. Warnings are also attached to the request response future for programmatic access.

New smallinttinyint CQL Types

Cassandra 2.2 introduced two new integer types: smallint and tinyint. This driver release includes core support for these types (semantics the same as other signed integers, but with different ranges). The types are also supported in the cqlengine mapper column models SmallInt and TinyInt.

New datetime CQL Types

Cassandra 2.2 also introduced new simple date and time types. These were previously supported by the core driver, discussed under the heading "New Date and Time Cassandra Types" in the last release blog.

This driver release adds support for these types to cqlengine in the form of Date and Time column types. Doing this also removed the previously-deprecated overload of Date, which used timestamp CQL under the covers and simply truncated the time component on input. Users of this can change their models to use DateTime and use datetime.date as input.

User Defined Function and Aggregate Metadata Model

Cassandra 2.2 adds User Defined Functions and Aggregates to the server. Working with these in CQL is transparent to the driver. The one driver change
related to functions and aggregates was to add these entities to the metadata queries and model. The models can be accessed via functions and aggregates
attributes of the keyspace metadata.

Platform and Runtime Survey

We solicit input from our users regarding platform and runtime environments in which the driver is being used. If you haven't already (or if your environment has changed), we would appreciate your input on our platform and runtime surveys.

Wrapping Up

As always, thanks to all who provided contributions and bug reports. The continued involvement of the community is appreciated:

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.