Indexing in Cassandra with Storage Attached Indexes (SAI)
Have you ever wondered why Cassandra requires the ALLOW FILTERING keyword for some queries? It’s because you’ve tried to query on a column that isn’t part of the partition key. While it’s not recommended to use ALLOW FILTERING in most cases for performance reasons, Cassandra’s secondary indexes provide a better approach. The Storage Attached Index (SAI) is a new secondary index implementation now available in Datastax Astra and Datastax Enterprise. SAI provides a filtering capability which is easier to use, more efficient, and simpler to maintain than Cassandra’s current indexing or add-on search solutions.
When to use SAI
Indexes allow you to query columns outside the Cassandra partition key without using the ALLOW FILTERING keyword or creating custom tables for each query pattern, as you would according to the classic best practices for Cassandra data modeling. You can create a table that is most natural for you, write to just that table, and query it any way you want. Your queries are not restricted by your primary key.
Next concept: Defining SAI indexesDefining SAI indexes
After creating your database, a keyspace, and one or more tables, use CREATE CUSTOM INDEX ... USING 'StorageAttachedIndex'
DDL commands to define one or more SAI indexes on the table that you wish to index.
Querying your table
Once the index has been created, it is simply a matter of querying the table and specifying the SAI-indexed columns.
SAI is supported by DataStax Enterprise 6.8.3 and later (see the DSE release notes), and you can also give it a try on your free database in DataStax Astra in the skill building section below.
SAI is on the roadmap to be added to OSS Cassandra in the near future. See the Cassandra Enhancement Proposal (CEP-7) for more details.
More Resources
Hands-on learning, articles, and documentation for SAI
SAI Quick Start
Follow this short tutorial to get started quickly with using indexes on DSE or Astra.
See the DocsWhat is SAI?
Storage-Attached Indexing is a highly-scalable, globally-distributed index for Apache Cassandra®.
See the DocsBetter Cassandra Indexes for a Better Data Model: Introducing Storage-Attached Indexing
The future of indexing in Apache Cassandra is here.
Read More