TechnologyOctober 23, 2015

Tuple and UDT support in DSE Search

Tuple and UDT support in DSE Search

Tuple and UDTs are convenient ways to handle certain data structures which usually go together (check the Cassandra on-line documentation for details). Version 4.8 and later supports them and in this blog post we explain how to use them best.

Set-up for the demo

 

1. Start by creating a CQL table:

CREATE KEYSPACE udt WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

 

USE udt;

 

CREATE TYPE Alias (

    known_alias text,

    use_alias_always boolean,

    alternate_alias_ frozen<map<text, text>>

);

CREATE TYPE Name (firstname text, surname text, alias_data frozen<Alias>);

 

CREATE table demo (

  "id" VARCHAR PRIMARY KEY,

  "name" frozen<Name>,

  "friends" list<frozen<Name>>,

  "magic_numbers" frozen<tuple<int, int, int>>,

  "status" VARCHAR);

 

 

2. Now create a DSE Search core against that table

dsetool create_core udt.demo generateResources=true

 

And that's it. You're ready to go.

Inserting some data

Let's insert some data via CQL:

insert into demo (id, name, friends, magic_numbers) values ('2', {firstname:'Sergio', surname:'Bossa', alias_data:{known_alias:'Sergio', use_alias_always:false}}, [{firstname:'Berenguer', surname:'Blasi'}, {firstname:'Maciej', surname:'Zasada'}], (23,543,234));

 

Now let's insert some more data by using the HTTP interface with curl:

curl http://localhost:8983/solr/udt.demo/update -H 'Content-type:application/json' -d '[{"id":"1","name":"{\"firstname\":\"Berenguer\", \"surname\": \"Blasi\", \"alias_data\":{\"know_alias\":\"Bereng\", \"use_alias_always\":true}}", "friends":"[{\"firstname\":\"Sergio\", \"surname\": \"Bossa\"}, {\"firstname\":\"Maciej\", \"surname\": \"Zasada\"}]", "magic_numbers":"{\"field1\":14, \"field2\":57, \"field3\":65}" }]'

 

The use of the HTTP interface is discouraged and the preferred method is the CQL insert one. The HTTP method has been included in this exercise for sake of completeness.

Notable points

If we take a look at the schema that was auto-generated we will see all UDT and Tuples have been exploded into their individual fields following a 'dot notation'. This dot-notation is how we will be navigating and referring to specific sub-fields in our queries.

dsetool get_core_schema udt.demo

...

<fields>

<field indexed="true" multiValued="false" name="id" stored="true" type="StrField"/>

<field indexed="true" multiValued="true" name="friends" stored="true" type="TupleField"/>

<field indexed="true" multiValued="false" name="friends.firstname" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="friends.surname" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="friends.alias_data" stored="true" type="TupleField"/>

<field indexed="true" multiValued="false" name="friends.alias_data.known_alias" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="friends.alias_data.use_alias_always" stored="true" type="BoolField"/>

<dynamicField indexed="true" multiValued="false" name="friends.alias_data.alternate_alias_*" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="name" stored="true" type="TupleField"/>

<field indexed="true" multiValued="false" name="name.firstname" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="name.surname" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="name.alias_data" stored="true" type="TupleField"/>

<field indexed="true" multiValued="false" name="name.alias_data.known_alias" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="name.alias_data.use_alias_always" stored="true" type="BoolField"/>

<dynamicField indexed="true" multiValued="false" name="name.alias_data.alternate_alias_*" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="status" stored="true" type="TextField"/>

<field indexed="true" multiValued="false" name="magic_numbers" stored="true" type="TupleField"/>

<field indexed="true" multiValued="false" name="magic_numbers.field1" stored="true" type="TrieIntField"/>

<field indexed="true" multiValued="false" name="magic_numbers.field2" stored="true" type="TrieIntField"/>

<field indexed="true" multiValued="false" name="magic_numbers.field3" stored="true" type="TrieIntField"/>

</fields>

...

 

Instead of auto-generating the schema, we could have used the 'dsetool infer_solr_schema' command to propose a schema and then fine-tune the proposed schema to your needs. This proposed schema and fine tuning give the same flexibility as with any other datatype, to later create the core with your tuned schema. UDT and Tuple subfields are treated just as any other fields only you use the dot notation when referring to them.

Let's start searching

To search over UDT and Tuple fields, simply use the {!tuple} query parser in your queries:

1. 'Basic' querying

select * from udt.demo where solr_query='*:*';

 

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status

----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------

  1 |    [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |   (14, 57, 65) |   {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

 

2. Querying a UDT subfield

select * from udt.demo where solr_query='{!tuple}name.firstname:Berenguer';

 

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status

----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------

  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

 

3. Querying a Tuple subfield

select * from udt.demo where solr_query='{!tuple}magic_numbers.field1:14';

 

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status

----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------

  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

Notice how Tuple subfields, lacking a field name, are referred to as fieldX where X is just the position of the field in the Tuple.

 

4. Querying collections of UDTS/Tuple

select * from udt.demo where solr_query='{!tuple}friends.surname:Zasada';

 

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status

----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------

  1 |    [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |   (14, 57, 65) |   {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

 

5. Querying nested UDTS or Tuples

select * from udt.demo where solr_query='{!tuple}name.alias_data.use_alias_always:false';

 

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status

----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------

  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

 

6. Using 'AND' for instance... Notice the query is enclosed in parenthesis.

select * from udt.demo where solr_query='({!tuple}friends.surname:Zasada AND {!tuple}friends.surname:Blasi)';

 

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status

----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------

  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

 

7. Negative queries

select * from udt.demo where solr_query='-{!tuple}name.alias_data.known_alias:*';

 

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status

----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------

  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

 

8. Dynamic fields. Notice the map that is inserted at 'name.alias_data.alternate_alias_'

insert into demo (id, name, friends, magic_numbers) values ('3', {firstname:'Maciej', surname:'Zasada', alias_data:{known_alias:'Maciej', use_alias_always:false, alternate_alias_:{'alternate_alias_one':'Super-Maciej', 'alternate_alias_two':'The-Great-Maciej'}}}, [{firstname:'Berenguer', surname:'Blasi'}, {firstname:'Sergio', surname:'Bossa'}], (2423,23,423));

 

select * from udt.demo where solr_query='{!tuple}name.alias_data.alternate_alias_one:*';

 

 id | friends                                                                                                                   | magic_numbers   | name                                                                                                                                                                                                         | solr_query | status

----+---------------------------------------------------------------------------------------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------

  3 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Sergio', surname: 'Bossa', alias_data: null}] | (2423, 23, 423) | {firstname: 'Maciej', surname: 'Zasada', alias_data: {known_alias: 'Maciej', use_alias_always: False, alternate_alias_: {'alternate_alias_one': 'Super-Maciej', 'alternate_alias_two': 'The-Great-Maciej'}}} |       null |   null

 

 

Changing the schema

Now you decide you will no longer be indexing the 'use_alias_always' because it is of little use to you:

1. dsetool get_core_schema udt.demo current=true > currentSchema.xml

2. Edit the currentSchema.xml and remove the 'use_alias_always' fields that you want to remove

3. dsetool reload_core udt.demo schema=currentSchema.xml reindex=true

 

select * from udt.demo where solr_query='{!tuple}name.alias_data.use_alias_always:*';

ServerError: <ErrorMessage code=0000 [Server error] message="undefined field name.alias_data.use_alias_always">

 

As you can see, you have full granularity on Tuple/UDT fields.

Maybe sometime you will need to add a new Tuple/UDT column. Using the dsetool 'infer_solr_schema' gives you a proposal for a schema with the new field exploded into all subfields following the dot-notation. Following similar steps as above, you could edit the proposed schema to remove any unwanted fields, etc and reload the schema to start using your new field.

Limitations

There are only 2 limitations when using UDTs and Tuples in DSE Seach:

  1. Tuples and UDTs cannot be part of the partition key.
  2. You can't use tuples and UDTs as CQL map values. In other words you can't have a dynamic field of UDT/Tuple type (See this workaround)

Advanced:

Tuples/UDTs are read/written in one single block, not on a per field basis, so factor in the single block read and write.

 

Discover more
DSE Search
Share

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.