CompanySeptember 19, 2020

Developer Newsletter: Simplify your Cassandra Data Model with Better Indexing

Jeffrey Carpenter
Jeffrey CarpenterSoftware Engineer - Stargate
Developer Newsletter: Simplify your Cassandra Data Model with Better Indexing

This issue is guest edited by Rebecca Mills (@rebccamills), DataStax Developer Learning:

If you have been working with databases for a while, indexing is probably a familiar concept.

Database indexes enhance your data model and make your queries more efficient. Although Cassandra has had secondary indexes for a long time, indexing in itself is generally associated with several tradeoffs and problems. Many Cassandra experts have recommended avoiding use of indexing because of these tradeoffs, and as a result, we as a community have emphasized using denormalization to maximize performance of our queries.

The two previous secondary indexing implementations in Cassandra are Storage Attached Secondary Indexing (SASI) and Secondary Indexes (or 2i for short). The two main challenges with these implementations have been (1) write amplification and (2) index size on disk. SAI represents a huge improvement to both of these pain-points.

As Jonathan Lacefield wrote in his recent blog, the new Storage Attached Index (SAI) addresses these issues, while also creating opportunities for more flexible queries in Cassandra. SAI has been designed with a format sympathetic to Cassandra’s SSTables to use significantly less disk space. Through extensive testing and optimization, SAI supports faster writes than Cassandra or DSE Search indexes. 

Give SAI a try in your free Astra cluster. SAI is also available in DataStax Enterprise 6.8.3. For a hands on learning experience, check out the new Cassandra Indexing Skills Page on our Developer site, and read up on more details in the Astra SAI Documentation.

What’s next for SAI? DataStax has submitted the Apache CEP to bring this functionality to the Apache version of Cassandra. We’d love your feedback to help refine this feature for the benefit of the worldwide Cassandra community.

Example of the Week 

Our featured example for this week is a quick Storage-Attached Indexing demo. Download the schema and data set, open your cqlsh and follow along with Patricia Gorla (pgorla), Solutions Architect at Datastax, as she walks you through the basics of SAI:

Have fun trying out this new breed of indexing, and let us know if you have any questions on this example or suggestions for future examples at developer@datastax.com or @DataStaxDevs.

Upcoming Events 

  • JHipster - Automatically generate a data access layer, API, and a front end via Angular/React for data in Cassandra! (Blueprints available in Kotlin, .NET, as well Java)
  • Cassandra Lunch Recordings on Youtube - Weekly recordings from an informal Cassandra meetup (Cassandra & Datastax DC & Cassandra Chicago) on Zoom, recordings available on Youtube. Join any Wednesday 11PM CST/12PM EST
  • Diagnostic Collection Tool - Analyzing the issues on a Cassandra / DataStax cluster is not always possible online. Here’s a very useful script to gather logs/conf from a cluster. 

New Podcast

DataStax’s Chief Strategy Officer, Sam Ramji (@sramj)i is hosting a new podcast series called Open||Source||Data that just launched this week. He’ll explore open-source data, open-source software, data on Kubernetes, data in DevOps, and data in AI with old friends and new friends. Don’t miss out on the first podcast from Patricia Boswell and upcoming podcasts from Matt Asay, Rachel Chalmers, and Kelsey Hightower by subscribing on Spotify, Apple Podcasts, or Google podcasts


Have a suggestion or story to share? We’d love your feedback: developer@datastax.com | @DataStaxDevs

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.