TechnologyMay 18, 2012

On the evolutions of CQL3

On the evolutions of CQL3

On the evolutions of CQL3

Two previous blog posts ( the evolution of schema in Cassandra and what's New in CQL 3.0) have explained the main changes in CQL3: the support of composites and the more natural support for wide rows (e.g. using transparent transposition). However, while those are the primary changes, they are by no mean the only ones that CQL3 brings to the table and this post will describe a number of those smaller changes.

But before starting, let me add that a good part of the features we will describe here have been finalized after the 1.1.0 release and so will only be present in the upcoming Cassandra 1.1.1 minor release. And in particular Cassandra 1.1.1 will include numerous bug fixes for CQL3.

This is also a good time for a reminder that currently, and for the remaining of Cassandra 1.1, CQL3 is considered beta. By that we mean that while most of the language is settled, we want to use the time until Cassandra 1.2 to finalize and improve it. So while we will not break CQL3 in any major way, we reserve the right to make small breaking changes if this will significantly improve the language.

Case sensitivity

The CQL2 story with respect to case sensitivity was a bit fuzzy. In CQL2, the PRIMARY KEY was case insensitive, but all other column identifiers where case sensitive. While not a big concern, this was inconsistent. So we have cleaned that up in CQL3. In CQL3, identifiers (keyspace/table/columns names) are always case insensitive by default. However, one can force the case by using double-quotes. In other words, if you define:

    CREATE TABLE test (
        Foo int PRIMARY KEY,
        "Bar" int
    )


then the following queries reference the defined columns:

    SELECT foo FROM ...
    SELECT Foo FROM ...
    SELECT FOO FROM ...
    SELECT "foo" FROM ... // we lowercase everything internally
    SELECT "Bar" FROM ...


but the following ones do not:

    SELECT "Foo" FROM ...
    SELECT "BAR" FROM ...
    SELECT bar FROM ...

Exclusive bounds

CQL2 had no support for query with exclusive bounds. Quoting the CQL2 documentation:

The greater-than and less-than operators (> and <) result in key ranges that are inclusive of the terms. There is no supported notion of "strictly" greater-than or less-than; these operators are merely supported as aliases to >= and <=.

This is not the case anymore in CQL3, where strict bounds are respected. So say you have:

    cqlsh:ks> CREATE TABLE scores (
          ...     name text,
          ...     score int,
          ...     date timestamp,
          ...     PRIMARY KEY (name, score)
          ... );
    ks> INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2011-06-24');
    ks> INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2011-06-25');
    ks> INSERT INTO scores (name, score, date) VALUES ('bob', 33, '2011-06-26');
    ks> INSERT INTO scores (name, score, date) VALUES ('bob', 40, '2011-06-27');


then you will get:

    cqlsh:ks> SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
     date                     | score
    --------------------------+-------
     2011-06-27 00:00:00+0000 |    40
     2011-06-24 00:00:00+0000 |    42
     2011-06-25 00:00:00+0000 |    47


but as expected:

    cqlsh:ks> SELECT date, score FROM scores WHERE name='bob' AND score > 40;
     date                     | score
    --------------------------+-------
     2011-06-24 00:00:00+0000 |    42
     2011-06-25 00:00:00+0000 |    47

timeuuid support

UUID are an often used tool in Cassandra. Indeed, in a fully distributed environment, the cost of having an auto-increment field to achieve unique identifiers would be prohibitive and UUID are a good replacement.

But another common use case is for time series. If you insert data keyed by their insertion time, you usually want to avoid collision if two events have the exact same time (for your time precision). An easy way to ensure this is to use Type 1 UUID.

All this was already possible in CQL2 through the uuid type, that accepts any type of UUID and sorts type 1 UUID according to their embedded timestamp. However, when you use a type 1 UUID in a time series, the important information is that this is a collision-free timestamp. And using uuid doesn't carry this intention. For that reason, we've introduced in CQL3 the timeuuid type (that uses the existing thrift comparator TimeUUIDType underneath). This type only accepts type 1 UUID (to avoid mistakenly inserting UUID that don't represent a time) and allows using the CQL date syntax (e.g. '2011-06-24 11:05:42') to input them.

Direct query of system tables

The thrift API has the describe_keyspaces function to describe the cluster schema. However, so far CQL had no such facilities (cqlsh circumvents that by using the thrift call underneath). However, thanks to the ameliorations made to internal schema management in Cassandra 1.1 and the support of composites, it is now possible to query the system tables directly in CQL3. For instance, one can query the defined keyspaces using:

    cqlsh:ks> SELECT * FROM system.schema_keyspaces;
     keyspace | durable_writes | name | strategy_class | strategy_options
    ----------+----------------+------+----------------+----------------------------
           ks |           True |   ks | SimpleStrategy | {"replication_factor":"1"}


Columns families information can as easily be retrieved by querying system.schema_columnfamilies and columns metadata through system.schema_columns.

Paging through non-ordered partitioner results

When using the RandomPartitioner, Cassandra rows are ordered by the md5 of their value and hence the order of rows is not meaningful. Despite that fact, given the following definition:

    CREATE TABLE test (
        k int PRIMARY KEY,
        v1 int,
        v2 int
    )


and assumming RandomPartitioner, CQL2 allowed queries like:

    SELECT * FROM test WHERE k > 42;

But the semantics of such queries was to query all rows for which the md5 of their key was bigger than the md5 of 42. And so this query would return results where k ≤ 42, which is unintuitive. To avoid that, CQL3 now forbids such query (though of course the query is allowed if the partitioner in use is ordered).

That being said, even with the random partitioner, it can sometimes be of use to page through all rows. In other words, it can be useful to "query all rows for which the md5 of their key is bigger than the md5 of 42". It is simply that the syntax "WHERE k > 42" is not appropriate. So we have added a new function token to allow doing just that and the following is allowed:

    SELECT * FROM test WHERE token(k) > token(42);

The end of "range ghosts"

With the thrift API, get_range_slices queries may sometimes return empty rows, a phenomenon known as range ghosts(see this FAQ entry for more details). This is often confusing for new users and range ghosts are rarely useful since one cannot tell (without doing an additional query) whether getting a range ghost means that the row exists but had no columns matching the query, or does not exist at all. Furthermore, CQL2 perpetuates that range ghost problem.

But the abstraction of transposing wide rows that CQL3 provides makes this problem much easier to tackle. By definition, a CQL3 row (i.e. a row in the SQL sense) exists only if it has at least one column set outside the PRIMARY KEY ones. And a query will return only those existing rows that match the WHERE clause and no ghost will be returned anymore.

CLUSTERING ORDER

As explained in this previous post, CQL3 allows to order query results in some cases, making use of the on-disk sorting of columns. When allowed, you can order results in ascending or descending order. By default, the ascending ordering will be slightly more efficient that the descending order. However some model have that property that most of the queries will be interested by the descending order. An example is a time series where the main query is to return the 10 more recent elements, which amounts do something along the line of:

    SELECT * FROM timeseries WHERE event_type='...'
    ORDER BY insertion_time DESC LIMIT 10;


In those cases, CQL3 allows you to specify a CLUSTERING ORDER options at table creation. This is a performance hint for Cassandra: it won't change in any way the table definition, but underneath columns will be stored on disk in the reverse order of what they would be if the default CLUSTERING ORDER was used. As a consequence, descending queries will be faster than the ascending ones. For instance, for a table where the more performance sensive query is the one above, the definition would look like:

    CREATE TABLE timeseries (
        event_type text,
        insertion_time timestamp,
        event blob,
        PRIMARY KEY (event_type, insertion_time)
    ) WITH CLUSTERING ORDER BY (insertion_time DESC);

Conclusion

On top of the "big" feature like composite types, CQL3 brings a lot of language cleanups and additional features, which we really think make the language more useful and more easy to apprehend, especially for beginners. Yet that is not the end of the road and there is a lot of features in the making, including amongst other:

  • Binary protocol for CQL (CASSANDRA-2478): currently CQL still uses thrift as a transport. This is not as efficient as can be however and this limit a number of functionality that we would like to add, so we will add a custom made binary protocol (More on that to come soon).
  • Access to the column timestamp and ttl (CASSANDRA-4217): while the timestamp and ttl of columns is currently returned in the result set, SQL-oriented drivers have not way to expose it. So we plan on adding new CQL functions to retrieve those values.
  • Filtering null values (CASSANDRA-3783)
  • Support for composite secondary indexes (CASSANDRA-3680)
  • And much more ...

Lastly, note that a draft for the documentation of CQL3 has been posted on CASSANDRA-3779 and should be publicly available soon (I'll update this post when that is the case.).

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.