Collections in Cassandra
This is an excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra®;" which delves into how to choose the right data model for your Apache Cassandra® application in 5 easy steps. Click here to download the full whitepaper.
Collections in Cassandra
When modeling the database, some developers might be tempted to store tags associated with videos in a separate table. When the list of anticipated tags is small however, using a collection data type that stores tags inside the database record can be more efficient. This simplifies the database design and reduces the number of tables required.
The five collection data types in Cassandra are:
- Set – a group collection of unique values of the same data type.
- List – an ordered collection of non-unique values of the same data type.
- Map – a set of key-value pairs, where keys are unique, and both keys and values have associated data types.
- Tuple – a fixed length list of non-unique values of different data types.
- Nested collection – a collection (i.e., set, list, map, or tuple) that is nested inside of another collection.
When defining a collection, the user needs to provide a data type for its elements. A simplified version of our videos table is provided below for illustration.
A sample row in this table may look as shown:
CQL provides convenient syntax to insert, update, or delete items in collections. For example, a user can update the record for “My Funny Cat Video” and add a tag “wet cat” as shown:
User-defined data types
Another data type in Cassandra that provides flexibility is a user-defined type (UDT). UDTs can attach multiple data fields—each named and typed—to a single column.
Let’s assume that the designers of KillrVideo decide to store an optional mailing address for each user. Rather than add multiple address-related fields, an address type can be created and leveraged across multiple Cassandra tables.
The user-defined address type can now be included in the users table as shown. The frozen keyword is required to use a UDT inside of a collection. It forces Cassandra to treat the address as a single value. Individual elements of a frozen address cannot be updated individually; rather, the entire address must be overwritten.
Thanks for reading this excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra®;" tune in next week when we release another excerpt or click here to download the full asset.