CompanyMay 10, 2018

What’s New for Search in DSE 6

 What’s New for Search in DSE 6

DataStax Enterprise (DSE) 6 is built on the best distribution of Apache Cassandra™ and DSE 6 Search is built with a production-certified version of Apache Solr™ 6.

Let me provide a quick tour of some of the enhancements and features found in DSE 6 Search. We are continuing with the themes of improving performance and stability as well as eliminating complexity for users from the previous release.

One of the most profound topics for DSE 6 is its Advanced Performance architecture. A large swath of engineering focus was on re-integration of search with the new advanced performance version of Cassandra as part of this release.

The new core architecture created an opportunity for a cleaner redesign of the search indexing pipeline for DSE. The outcome of all of this work is less configuration required to operate search and improved search data consistency that also comes with more throughput. The redesign mechanisms include a more synchronous write path for indexing data with less moving pieces to tune and monitor or back-pressure mechanisms that can be bottlenecks. This is an overall win for DataStax that also has direct benefits for our users by making things faster, more stable, and easier to use.

NodeSync is another big new feature for DSE 6. While no additional engineering efforts were required to integrate with this functionality outside of testing, NodeSync will bring major benefits to DSE Search. Since search data is automatically handled by DSE, repaired stored data means repaired search data — a huge win for consistency for the entire DSE data layer.

Moving on to search-specific features, DSE 5.1 introduced a series of functionalities aimed at making search even more unified with DSE. As we continue on that journey with DSE 6, native CQL queries can now leverage search indexes for a wider array of CQL query functionality and indexing support. Search queries no longer require the use of ‘solr_query’ and CQL queries that require ‘ALLOWED FILTERING’ no longer have that requirement as search indexes will automatically be utilized.

Beyond some of the more basic Boolean queries and non-key lookups, there are couple of new keywords that are supported, and there are particular operators that deserves special attention. With DSE 6 Search, users can now use the ‘IS NOT NULL’  and ‘!=’ operators. Furthermore, users can also leverage the LIKE operator directly against search indexes along with the added benefit of tailoring the search behavior of the operator through configuration.

Let’s take a look at some of the queries we can now run with pure CQL in DSE.

In the DSE Search 5.1 announcement blog, we created a search index that was more suitable for Boolean and lookup queries over full text search. With DSE 6, the default search index configuration has slightly changed to provide functionality more inline with the ANSI SQL’s LIKE operator versus full-text analysis. This requires less processing to generate the data and less index data for this level of search. We’ll refer to the DSE 5.1 blog for creating and configuring a search index on a table.

Start with this simple CQL schema:

CREATE TABLE amazon.metadata (

   asin text PRIMARY KEY,

   also_bought set<text>,

   buy_after_viewing set<text>,

   categories set<text>,

   imurl text,

   price double,

   title text

);

Create a search index on this table:

CREATE SEARCH INDEX IF NOT EXISTS ON amazon.metadata;

Let’s go ahead and try a variety of simple CQL queries that will be labeled if they should or should not be executed based on what we know about CQL data modeling. First is a query on a field that is not part of the primary key. We have a simple partition key so that any query that isn’t on the ‘asin’ field should fail.

 

cqlsh> SELECT asin, price, title FROM amazon.metadata WHERE price = 18.49;

asin       | price | title

————+——-+————————————————

002560810X | 18.49 | Needlepoint and Pattern: Themes and Variations

(1 rows)

 

We can also use one of the new operators on this field:

 

cqlsh> SELECT asin, price, title FROM amazon.metadata WHERE price != 0 LIMIT 10;

asin       | price | title

————+——–+——————————————————

0022841393 |  83.52 |             Science: A Closer Look – Grade 6

0028028651 |  197.2 |           Business Law with UCC Applications

B00E8M1HQM |   11.6 |

000716467X |   8.89 |                    Emotional Rollercoaster

0007044984 | 192.06 |                             Human Anatomy- Text Only

0007321198 |  45.31 | Collins English Dictionary: 30th Anniversary Edition

002560810X |  18.49 | Needlepoint and Pattern: Themes and Variations

002391341X | 153.33 |                   Descriptive Geometry (9th Edition)

0006392830 |   4.99 |                               Lesias Dream

B004TMC2FQ |   6.37 |

(10 rows)

 

Because we have a simple string index on the title field, we can do exact queries against that field. Of course, if we wanted more flexibility on our search capabilities we can simply reconfigure and rebuild our index to do more advanced search queries:

 

cqlsh> SELECT asin, title FROM amazon.metadata WHERE title = ‘Emotional Rollercoaster’;

asin       | title

————+————————-

000716467X | Emotional Rollercoaster

(1 rows)

 

We can query against collections as well:

 

cqlsh> SELECT count(*) FROM amazon.metadata WHERE categories CONTAINS ‘Crime’;

count

——-

  440

 

Combining conditions becomes as simple as stating them in CQL terms:

 

cqlsh> SELECT asin, title FROM amazon.metadata WHERE categories CONTAINS ‘Books’ AND title = ‘Emotional Rollercoaster’;

asin       | title

————+————————-

000716467X | Emotional Rollercoaster

 

As previously mentioned, there was a slight change to the default configuration of the search index to behave closer to ANSI SQL’s LIKE operator.

Here are a few examples that show that text search is still possible. In the first example, we are looking for any rows that are in the category ‘Books’ and contains the word ‘Emotional’ in the title where Emotional is case sensitive:

 

cqlsh> SELECT asin, title FROM amazon.metadata WHERE categories CONTAINS ‘Books’ AND title LIKE ‘%Emotional%’;

asin       | title

————+————————————————————————————

0007112580 |      Emotional Healing in Minutes: Simple Acupressure Techniques For Your Emotions

000716467X |                                                            Emotional Rollercoaster

0007197772 | Sedona Method: How to Get Rid of Your Emotional Baggage and Live the Life You Want

0028740173 |                                                  The Emotional Life of the Toddler

002921405X |         SHOULDN’T I BE HAPPY?: Emotional Problems of Pregnant and Postpartum Women

 

We can also search by the same parameters as before but instead of containing the word ‘Emotional’, it should startwith the word:

 

cqlsh> SELECT asin, title FROM amazon.metadata WHERE categories CONTAINS ‘Books’ AND title LIKE ‘Emotional%’;

asin       | title

————+——————————————————————————-

0007112580 | Emotional Healing in Minutes: Simple Acupressure Techniques For Your Emotions

000716467X |                                                       Emotional Rollercoaster

(2 rows)

 

While the initial search-enabled CQL functionality exposure is limited to existing keywords, this is merely the first phase of search-enabled CQL enhancements to come. There was a lot of work that was done behind the scenes to establish the foundation for potential query optimizers, query analysis, and other, more sophisticated queries in general. Subsequent enhancements will see more search-orientated functionality such as relevance queries, spatial filters, and even facet queries! We know this direction will help make issuing search queries much easier for our users.

Some of the other work that went into this release was focused around easing some of the operational burdens that could arise with misuse of DSE Search.

Most importantly, we’ve made two key changes in an effort to add safety guardrails around the Solr HTTP API with regards to data writes and deletes that should be done in CQL.

First, we’ve disabled the ability to perform writes through the Solr HTTP interface. This has been branded as a very bad practice for quite some time and we were finally able to remove this capability.

And second, we’ve disabled the ability to execute deletes through the Solr HTTP interface which can be properly executed in CQL. This will reduce some of the pitfalls from the system that only cause issues. These changes enforce best practices to be followed through DSE.

More logging was also added around shard replica requests to improve our support’s troubleshooting turnaround times.

Finally, some default index behavior from Cassandra was overridden to improve native repair operations with regards to search indexes to further eliminate operational complexities.

You can download DSE 6 now and try out all that I’ve walked through above. If you’re interested in learning about additional features we introduced in DSE 6, check out our online documentation and these other blog posts:

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.