TechnologyJuly 26, 2013

CQL improvements in Cassandra 2.0

CQL improvements in Cassandra 2.0

At the time of this writing, Cassandra 2.0.0-beta2 has just been released and it shouldn't be too long before the final lands. In this blog post, we will describe the improvements and new features that C* 2.0 brings on the CQL front. I'll note that none of the following improvements are backward breaking changes of the language, they are only additions to it.

ALTER DROP

CQL3 in Cassandra 1.2 does not allow to drop a CQL3 column. This is fixed in Cassandra 2.0, where if you have

    CREATE TABLE myTable (
        id text PRIMARY KEY,
        prop1 text,
        prop2 int,
        prop3 float
    )


then you are allowed to do:

    ALTER TABLE myTable DROP prop3;

 

As is expected of such statement, it will drop prop3 from the table definition but will also remove all data pertaining to that column in the database. This data removal is however performed lazily during compaction (compaction simply looks for dropped columns in the input sstables and doesn't include them in the output; note that if you want to force the removal of dropped columns without waiting for compactions to automatically kick in, you can simply call nodetool upgradesstables) and the ALTER TABLE statement (that simply updates the table metadata to register the drop) will return quickly.

Conditional updates

Cassandra 2.0 introduces some support for lightweight transactions (using Paxos underneath). On the CQL front, this is exposed through the support of the IF keyword in INSERTUPDATE and DELETE statements (the previous blog post on these lightweight transactions provides examples of this support in CQL so we don't repeat them here).

Conditional schema modifications

Schema modification statements (CREATE/DROP KEYSPACE/TABLE/INDEX) also support a form of conditionals in Cassandra 2.0. This is particularly convenient if at the start of some insertion code you don't know if a keyspace (or table, or index) exists or needs to be created. For that, you can now do

    CREATE KEYSPACE IF NOT EXISTS ks
               WITH replication = { 'class': 'SimpleStrategy',
                                    'replication_factor' : 3 };
    CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);


Similarly, you can issue conditional drops:

    DROP KEYSPACE IF EXISTS ks;

 

This syntax is merely a convenience: for instance, a (non-conditional) CREATE KEYSPACE statement will throw a specific exception if the keyspace already exists. So you can also issue such non-conditional creation and ignore the exception if it is thrown, but this new syntax offers a more concise way to achieve the same effect.

Triggers

Cassandra 2.0 also introduces experimental support for triggers and CQL offers a new syntax to register a trigger on a table, namely:

    CREATE TRIGGER myTrigger
                ON myTable
             USING 'org.apache.cassandra.triggers.InvertedIndex'


where 'org.apache.cassandra.triggers.InvertedIndex' is the Java class implementing the trigger in that example (this class is available at here if you are interested). It is of course also possible to drop such a trigger:

    DROP TRIGGER myTrigger ON myTable

 

Please note however that trigger support is currently experimental. In particular, while the CQL syntax above is unlikely to change, the Java interface that trigger need to currently implement will probably change.

Secondary indexes on PRIMARY KEY columns

In Cassandra 1.2, you can only create secondary indexes on CQL3 columns that are not part of the PRIMARY KEY definition. In other words, the following does not work in C* 1.2 but will in C* 2.0:

    CREATE TABLE timeline (
        event_id uuid,
        week_in_year int,
        created_at timeuuid,
        content blob,
        PRIMARY KEY ((event_id, week_in_year), created_at)
    );
    
    -- Invalid in Cassandra 1.2 but not in 2.0
    CREATE INDEX ON timeline (week_in_year);

 

Let me remark however that secondary indexing of collection columns is still not yet supported in C* 2.0. This will come later.

Aliases in SELECT

CQL supports a number of function calls on the column names selected by a SELECT. For instance, in cqlsh you can do:

cqlsh:ks> SELECT event_id, dateOf(created_at), blobAsText(content) 
            FROM timeline;

 event_id                | dateOf(created_at)       | blobAsText(content)
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?


While this is fine, it might not be convenient in practice to refer to a column named dateOf(created_at) in the result set, so a new alias feature as been added and you can now do:

cqlsh:ks> SELECT event_id, 
                 dateOf(created_at) AS creation_date,
                 blobAsText(content) AS content 
            FROM timeline;

 event_id                | creation_date            | content
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?

 

Preparing timestamp, ttl and limit

Cassandra 1.2 doesn't allow you to use a bind marker for the TIMESTAMP and TTL properties of update statements, nor for the LIMIT property of SELECT statements. This is now fixed and you can for instance prepare statements like:

    SELECT * FROM myTable LIMIT ?;
    UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';

 

Native protocol improvements

On top of the improvements to the CQL language described above, Cassandra 2.0 introduces the 2nd version of the native protocol for CQL, which brings a number of improvements over the first version (Let's note that client drivers will need to be modified to support that new protocol version (and its improvements) but that Cassandra 2.0 still support the first version of the protocol, so client drivers that work against C* 1.2 will work with C* 2.0).

The main improvements made in native protocol version 2 are:

  • Batching of prepared statements: this allows to execute individually prepared statements in a BATCH.
  • One-off prepare and execute statements: this allows to pass values for a statement as binary (to avoid a conversion to string for blobs for instance) even when you don't want to prepare the statement (because you don't plan on executing that query more than once).
  • Automatic/incremental paging of SELECT statements: With Cassandra 1.2, the result set for a SELECT query is always sent in its entirety, which might lead to query timeout and/or out of memory exceptions (either on the server or client side) if this result set is big. Meaning that users need to be wary of that problem and should always include a LIMIT if the select may yield too much result. This is rather inconvenient and error prone however and the native protocol version 2 fixes that by allowing incremental (and transparent for the user) fetching of a result set.
  • SASL for authentication: The first version of the native protocol has a custom and relatively limited authentication support. As a consequence, it is hard (and very inconvenient if not impossible) to provide secure authentication. Version 2 improves on that situation by replacing this custom mechanism by SASL authentication.
  • More compact result sets for prepared statements

We will describe more precisely (and with examples) those improvements in a follow up blog post so stay tuned.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.