Tuple and UDTs are convenient ways to handle certain data structures which usually go together (check the Cassandra on-line documentation for details). Version 4.8 and later supports them and in this blog post we explain how to use them best.
Set-up for the demo
1. Start by creating a CQL table:
CREATE KEYSPACE udt WITH REPLICATION = { 'class' : 'SimpleStrategy' , 'replication_factor' : 1 };
USE udt;
CREATE TYPE Alias (
known_alias text,
use_alias_always boolean,
alternate_alias_ frozen<map<text, text>>
);
CREATE TYPE Name (firstname text, surname text, alias_data frozen<Alias>);
CREATE table demo (
"id" VARCHAR PRIMARY KEY ,
"name" frozen< Name >,
"friends" list<frozen< Name >>,
"magic_numbers" frozen<tuple< int , int , int >>,
"status" VARCHAR );
|
2. Now create a DSE Search core against that table
dsetool create_core udt.demo generateResources=true
|
And that's it. You're ready to go.
Inserting some data
Let's insert some data via CQL:
insert into demo (id, name , friends, magic_numbers) values ( '2' , {firstname: 'Sergio' , surname: 'Bossa' , alias_data:{known_alias: 'Sergio' , use_alias_always: false }}, [{firstname: 'Berenguer' , surname: 'Blasi' }, {firstname: 'Maciej' , surname: 'Zasada' }], (23,543,234));
|
Now let's insert some more data by using the HTTP interface with curl:
curl http://localhost:8983/solr/udt.demo/update -H 'Content-type:application/json' -d '[{"id":"1","name":"{\"firstname\":\"Berenguer\", \"surname\": \"Blasi\", \"alias_data\":{\"know_alias\":\"Bereng\", \"use_alias_always\":true}}", "friends":"[{\"firstname\":\"Sergio\", \"surname\": \"Bossa\"}, {\"firstname\":\"Maciej\", \"surname\": \"Zasada\"}]", "magic_numbers":"{\"field1\":14, \"field2\":57, \"field3\":65}" }]'
|
The use of the HTTP interface is discouraged and the preferred method is the CQL insert one. The HTTP method has been included in this exercise for sake of completeness.
Notable points
If we take a look at the schema that was auto-generated we will see all UDT and Tuples have been exploded into their individual fields following a 'dot notation'. This dot-notation is how we will be navigating and referring to specific sub-fields in our queries.
dsetool get_core_schema udt.demo
...
< fields >
< field indexed = "true" multiValued = "false" name = "id" stored = "true" type = "StrField" />
< field indexed = "true" multiValued = "true" name = "friends" stored = "true" type = "TupleField" />
< field indexed = "true" multiValued = "false" name = "friends.firstname" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "friends.surname" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "friends.alias_data" stored = "true" type = "TupleField" />
< field indexed = "true" multiValued = "false" name = "friends.alias_data.known_alias" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "friends.alias_data.use_alias_always" stored = "true" type = "BoolField" />
< dynamicField indexed = "true" multiValued = "false" name = "friends.alias_data.alternate_alias_*" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "name" stored = "true" type = "TupleField" />
< field indexed = "true" multiValued = "false" name = "name.firstname" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "name.surname" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "name.alias_data" stored = "true" type = "TupleField" />
< field indexed = "true" multiValued = "false" name = "name.alias_data.known_alias" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "name.alias_data.use_alias_always" stored = "true" type = "BoolField" />
< dynamicField indexed = "true" multiValued = "false" name = "name.alias_data.alternate_alias_*" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "status" stored = "true" type = "TextField" />
< field indexed = "true" multiValued = "false" name = "magic_numbers" stored = "true" type = "TupleField" />
< field indexed = "true" multiValued = "false" name = "magic_numbers.field1" stored = "true" type = "TrieIntField" />
< field indexed = "true" multiValued = "false" name = "magic_numbers.field2" stored = "true" type = "TrieIntField" />
< field indexed = "true" multiValued = "false" name = "magic_numbers.field3" stored = "true" type = "TrieIntField" />
</ fields >
...
|
Instead of auto-generating the schema, we could have used the 'dsetool infer_solr_schema' command to propose a schema and then fine-tune the proposed schema to your needs. This proposed schema and fine tuning give the same flexibility as with any other datatype, to later create the core with your tuned schema. UDT and Tuple subfields are treated just as any other fields only you use the dot notation when referring to them.
Let's start searching
To search over UDT and Tuple fields, simply use the {!tuple} query parser in your queries:
1. 'Basic' querying
select * from udt.demo where solr_query= '*:*' ;
id | friends | magic_numbers | name | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
1 | [{firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (14, 57, 65) | {firstname: 'Berenguer' , surname: 'Blasi' , alias_data: {known_alias: null , use_alias_always: True , alternate_alias_: null }} | null | null
2 | [{firstname: 'Berenguer' , surname: 'Blasi' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (23, 543, 234) | {firstname: 'Sergio' , surname: 'Bossa' , alias_data: {known_alias: 'Sergio' , use_alias_always: False , alternate_alias_: null }} | null | null
|
2. Querying a UDT subfield
select * from udt.demo where solr_query= '{!tuple}name.firstname:Berenguer' ;
id | friends | magic_numbers | name | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
1 | [{firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (14, 57, 65) | {firstname: 'Berenguer' , surname: 'Blasi' , alias_data: {known_alias: null , use_alias_always: True , alternate_alias_: null }} | null | null
|
3. Querying a Tuple subfield
select * from udt.demo where solr_query= '{!tuple}magic_numbers.field1:14' ;
id | friends | magic_numbers | name | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
1 | [{firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (14, 57, 65) | {firstname: 'Berenguer' , surname: 'Blasi' , alias_data: {known_alias: null , use_alias_always: True , alternate_alias_: null }} | null | null
|
Notice how Tuple subfields, lacking a field name, are referred to as fieldX where X is just the position of the field in the Tuple.
4. Querying collections of UDTS/Tuple
select * from udt.demo where solr_query= '{!tuple}friends.surname:Zasada' ;
id | friends | magic_numbers | name | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
1 | [{firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (14, 57, 65) | {firstname: 'Berenguer' , surname: 'Blasi' , alias_data: {known_alias: null , use_alias_always: True , alternate_alias_: null }} | null | null
2 | [{firstname: 'Berenguer' , surname: 'Blasi' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (23, 543, 234) | {firstname: 'Sergio' , surname: 'Bossa' , alias_data: {known_alias: 'Sergio' , use_alias_always: False , alternate_alias_: null }} | null | null
|
5. Querying nested UDTS or Tuples
select * from udt.demo where solr_query= '{!tuple}name.alias_data.use_alias_always:false' ;
id | friends | magic_numbers | name | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
2 | [{firstname: 'Berenguer' , surname: 'Blasi' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (23, 543, 234) | {firstname: 'Sergio' , surname: 'Bossa' , alias_data: {known_alias: 'Sergio' , use_alias_always: False , alternate_alias_: null }} | null | null
|
6. Using 'AND' for instance... Notice the query is enclosed in parenthesis.
select * from udt.demo where solr_query= '({!tuple}friends.surname:Zasada AND {!tuple}friends.surname:Blasi)' ;
id | friends | magic_numbers | name | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
2 | [{firstname: 'Berenguer' , surname: 'Blasi' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (23, 543, 234) | {firstname: 'Sergio' , surname: 'Bossa' , alias_data: {known_alias: 'Sergio' , use_alias_always: False , alternate_alias_: null }} | null | null
|
7. Negative queries
select * from udt.demo where solr_query= '-{!tuple}name.alias_data.known_alias:*' ;
id | friends | magic_numbers | name | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
1 | [{firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }, {firstname: 'Maciej' , surname: 'Zasada' , alias_data: null }] | (14, 57, 65) | {firstname: 'Berenguer' , surname: 'Blasi' , alias_data: {known_alias: null , use_alias_always: True , alternate_alias_: null }} | null | null
|
8. Dynamic fields. Notice the map that is inserted at 'name.alias_data.alternate_alias_'
insert into demo (id, name , friends, magic_numbers) values ( '3' , {firstname: 'Maciej' , surname: 'Zasada' , alias_data:{known_alias: 'Maciej' , use_alias_always: false , alternate_alias_:{ 'alternate_alias_one' : 'Super-Maciej' , 'alternate_alias_two' : 'The-Great-Maciej' }}}, [{firstname: 'Berenguer' , surname: 'Blasi' }, {firstname: 'Sergio' , surname: 'Bossa' }], (2423,23,423));
select * from udt.demo where solr_query= '{!tuple}name.alias_data.alternate_alias_one:*' ;
id | friends | magic_numbers | name | solr_query | status
----+---------------------------------------------------------------------------------------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------
3 | [{firstname: 'Berenguer' , surname: 'Blasi' , alias_data: null }, {firstname: 'Sergio' , surname: 'Bossa' , alias_data: null }] | (2423, 23, 423) | {firstname: 'Maciej' , surname: 'Zasada' , alias_data: {known_alias: 'Maciej' , use_alias_always: False , alternate_alias_: { 'alternate_alias_one' : 'Super-Maciej' , 'alternate_alias_two' : 'The-Great-Maciej' }}} | null | null
|
Changing the schema
Now you decide you will no longer be indexing the 'use_alias_always' because it is of little use to you:
1. dsetool get_core_schema udt.demo current=true > currentSchema.xml
2. Edit the currentSchema.xml and remove the 'use_alias_always' fields that you want to remove
3. dsetool reload_core udt.demo schema=currentSchema.xml reindex=true
select * from udt.demo where solr_query='{!tuple}name.alias_data.use_alias_always:*';
ServerError: <ErrorMessage code=0000 [Server error] message="undefined field name.alias_data.use_alias_always">
|
As you can see, you have full granularity on Tuple/UDT fields.
Maybe sometime you will need to add a new Tuple/UDT column. Using the dsetool 'infer_solr_schema' gives you a proposal for a schema with the new field exploded into all subfields following the dot-notation. Following similar steps as above, you could edit the proposed schema to remove any unwanted fields, etc and reload the schema to start using your new field.
Limitations
There are only 2 limitations when using UDTs and Tuples in DSE Seach:
- Tuples and UDTs cannot be part of the partition key.
- You can't use tuples and UDTs as CQL map values. In other words you can't have a dynamic field of UDT/Tuple type (See this workaround)
Advanced:
Tuples/UDTs are read/written in one single block, not on a per field basis, so factor in the single block read and write.