TechnologyFebruary 26, 2014

CQL improvements in Cassandra 2.1

CQL improvements in Cassandra 2.1

A number of improvements/new features have been added to CQL in the upcoming Cassandra 2.1 release, and this post offers to walk through those improvements.

User Defined Types

The first addition to CQL coming in Cassandra 2.1 is the ability to create User Defined Types (or UDT for short). Let's introduce those through an example: suppose you store user profiles (with name, email, ...) and need, for each user, to store a number of addresses. In Cassandra 2.1, with UDTs, you will be able to do this the following way:

  CREATE TYPE address (
      street text,
      city text,
      zip int
  );
  
  CREATE TABLE user_profiles (
      login text PRIMARY KEY,
      first_name text,
      last_name text,
      email text,
      addresses map<text, frozen<address>>
  );
  
  // Inserts a user with a home address
  INSERT INTO user_profiles(login, first_name, last_name, email, addresses)
  VALUES ('tsmith',
          'Tom',
          'Smith',
          'tsmith@gmail.com',
          { 'home': { street: '1021 West 4th St. #202',
                      city: 'San Fransisco',
                      zip: 94110 }});

  // Adds a work address for our user
  UPDATE user_profiles
     SET addresses = addresses
                   + { 'work': { street: '3975 Freedom Circle Blvd',
                                 city: 'Santa Clara',
                                 zip: 95050 }}
   WHERE login = 'tsmith';

 

There is a relative flexibility on what the types of the fields of a UDT can be: collections are allowed as well as other UDTs. For instance, suppose that each address should store a few associated phone numbers, and that we want to allow a couple of tags with each phone number (the latter is arguably a bit far-fetched, this is just meant to demonstrate the feature). You can alter the address type above in the following way:

  CREATE TYPE phone (
      number text,
      tags set<text>
  );
  
  // Add a 'phones' field to address that is a set of the 'phone' UDT above
  ALTER TYPE address ADD phones set<frozen<phone>>;

In other words, UDTs can be nested relatively arbitrarily. However, UDTs have a main limitation in their current implementation, a limitation that one should keep in mind when using them: you cannot update only parts of a UDT value, you have to overwrite the whole thing every time. Put another way, a UDT value, no matter how complex it is, acts much like a blob would. In fact, it wouldn't be incorrect to describe the current implementation of UDTs as "syntactic sugar for blobs". Note that this limitation is not intrinsic to UDTs per-se, it's a limitation of the current implementation and it is possible that it will be somewhat loosened or even entirely lifted in future iterations. But for the time being, one should keep it in mind when using user defined types.

More concretely, we now can update the 'work' address above with

  UPDATE user_profiles
     SET addresses['work'] = {
             street: '3975 Freedom Circle Blvd',
             city: 'Santa Clara',
             zip: 95050,
             phones : {
               {number: '212 221 9165', tags: { 'preferred', 'direct line' }},
               {number: '500 310 2342', tags: { 'fax' }}
             }
         }
   WHERE login = 'tsmith';

<!--
And at this point, the example would yield:

  SELECT * FROM user_profiles;
  
  login  | addresses                                                                                                                                                                                                                                                                       | email            | first_name | last_name
  -------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+-----------
  tsmith | {'home': {street: '1021 West 4th St. #202', city: 'San Fransisco', zip: 94110}, 'work': {street: '3975 Freedom Circle Blvd', city: 'Santa Clara', zip: 95050, phones: {{number: '212 221 9165', tags: {'direct line', 'preferred'}}, {number: '500 310 2342', tags: {'fax'}}}}} | tsmith@gmail.com |        Tom |     Smith

-->

Some last remarks on user defined types:

  • A UDT value is always written in its entirety but it is also always read entirely under the hood (as of the current implementation at least).
    Note that it is possible, syntax wise, to only select some fields of a UDT, for instance

     

      CREATE TABLE location (
          id int PRIMARY KEY,
          addr frozen<address>, // Reuse the address type from above
          longitude double,
          latitude double
      );
      
      // The following is allowed
      SELECT addr.street, addr.city FROM location WHERE id=42;
        

    but such selection will still read the full address (in fact the full CQL
    row) server side. Long story short, UDTs are not meant to store large and complex "documents" as of their current implementation, but
    rather to help make the denormalization of short amount of data more convenient and flexible.

  • While you can add fields to an existing UDT as we've seen above, it is currently not possible to remove a field. It is however possible to rename a field (ALTER TYPE address RENAME street to str33t).
  • It is possible to use a UDT as type of any CQL column, including clustering ones. In that latter case, the ordering induced by the UDT is the one of it's fields in the order they have been declared. Please note however that there is relatively little advantages to be gain in using a UDT on a PRIMARY KEY column, avoid abusing such possibility just because it's available.

Secondary indexes on collections

So far, creating a secondary index on a collection column was not allowed. Cassandra 2.1 fixes that. For instance:

  CREATE TABLE products (
      id int PRIMARY KEY,
      description text,
      price int,
      categories set<text>,
      features map<text, text>
  );
  
  // This is now allowed in Cassandra 2.1
  CREATE INDEX cat_index ON products(categories);
  CREATE INDEX feat_index ON products(features);
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (34134,
               '120-inch 1080p 3D plasma TV',
               9999,
               {'tv', '3D', 'hdtv'},
               {'screen' : '120-inch', 'refresh-rate' : '400hz', 'techno' : 'plasma'});
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (29412,
               '32-inch LED HDTV (black)',
               929,
               {'tv', 'hdtv'},
               {'screen' : '32-inch', 'techno' : 'LED'});
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (38471,
               '32-inch LCD TV',
               110,
               {'tv', 'used'},
               {'screen' : '32-inch', 'techno' : 'LCD'});
  
  // You can then query those index through CONTAINS
  SELECT id, description FROM products WHERE categories CONTAINS 'hdtv';
  
   id    | description
  -------+-----------------------------
   29412 |    32-inch LED HDTV (black)
   34134 | 120-inch 1080p 3D plasma TV
  
  SELECT id, description FROM products WHERE features CONTAINS '32-inch';
  
   id    | description
  -------+--------------------------
   29412 | 32-inch LED HDTV (black)
   38471 |           32-inch LCD TV

Please note that for map columns, the index created above index the map values. If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS KEY:

  DROP INDEX feat_index;
  CREATE INDEX feat_key_index ON products(KEYS(features));
  
  SELECT id, description
    FROM products
    WHERE features CONTAINS KEY 'refresh-rate';
  
   id    | description
  -------+-----------------------------
   34134 | 120-inch 1080p 3D plasma TV
  

Please note however that a current limitation (to be lifted in the future) is that you cannot index both the keys and values of the same map simultaneously (hence the DROP INDEX call above).

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.