Introducing Vector Search: Empowering Cassandra and Astra DB Developers to Build Generative AI Applications
In the age of AI, Apache Cassandra® has emerged as a powerful and scalable distributed database solution. With its ability to handle massive amounts of data and provide high availability, Cassandra has become a go-to choice for many AI applications including Uber, Netflix, and Priceline. However, with the introduction of generative AI and large language models (LLMs), new query capabilities are needed.
Enter vector search, a revolutionary new feature that empowers Cassandra with enhanced search and retrieval functionalities for generative AI applications. As a preview for our community, we’ve made it available in DataStax Astra DB to try out and provide us with feedback. Get started by signing up and then trying the demo.
What is vector search?
Vector search is a cutting-edge approach to searching and retrieving data that leverages the power of vector similarity calculations. Unlike traditional keyword-based search, which matches documents based on the occurrence of specific terms, vector search focuses on the semantic meaning and similarity of data points. By representing data as vectors in a high-dimensional space, vector search enables more accurate and intuitive search results.
For example, vector search easily identifies the semantics in these examples that term-based search would struggle with:
- False positive: “Man bites dog” and “dog bites man” include the same words but have opposite semantics.
- False negative: “Tourism numbers are collapsing” and “Travel industry fears Covid-19 crisis will cause more companies to enter bankruptcy” have very similar meanings but different word choices and specificity.
- False negative: “I need a new phone” and “My old device is broken” have related meanings but no common words.
Integrating vector search with Cassandra
The integration of vector search with Cassandra (for details, see CEP-30) offers several benefits. It opens up exciting possibilities for applications that require similarity-based queries—and not just for text. Applications as diverse as recommendation systems, fraud detection, image recognition, and natural language processing can all benefit from vector search.
Here are some key advantages of incorporating vector search into Cassandra:
Unstructured data queries
Prior to vector search, Cassandra was limited to searching structured data (floats, integers, or full strings). Vector search now opens the possibilities to query unstructured data, including text, audio, pictures, and videos. This makes Cassandra the one-stop-shop for high-scale database applications.
Enhanced search accuracy
Vector search allows for similarity-based queries, enabling more accurate and relevant search results. By considering the semantic meaning of data points, it can uncover hidden relationships and patterns that traditional keyword searches might miss.
Efficient query processing
With vector search, Cassandra can perform similarity calculations and ranking directly within the database. This eliminates the need to transfer large amounts of data to external systems, reducing latency and improving overall query performance. Furthermore, you can combine vector search with other Cassandra indexes for even more powerful queries to find exactly the data you need.
Scalability and distributed processing
Cassandra's distributed architecture aligns perfectly with vector search requirements. As data volumes grow, vector search can leverage Cassandra's scalability and distributed processing capabilities to handle large-scale similarity queries efficiently.
Broad applicability
Vector search provides the flexibility to compute similarity across various types of data, including text, numerical values, images, and embeddings. This versatility enables developers to build advanced applications that span multiple domains and data types, all within the Cassandra ecosystem.
Vector search use cases
It seems that not a day goes by when a new, innovative application of generative AI is invented. Almost all generative AI use cases are enhanced by vector search because it allows developers to create more relevant prompts. Use cases of vector search for generative AI include:
Question answering
Converting documents to text embeddings can be combined with modern natural language processing (NLP) to deliver full text answers to questions. This approach spares users from studying lengthy manuals and empowers your teams to provide answers more quickly. A "question answering" generative AI model can take the text-embedding representation for both the knowledge base of documents and your current question to deliver the closest match as an "answer." (code)
Semantic search
Vector search powers semantic or similarity search. Because the meaning and context is captured in the embedding, vector search finds what users mean, without requiring an exact keyword match. It works with textual data (documents), images, and audio, to, for example, easily and quickly help users find products that are similar or related to their query.
Semantic caching
As your generative application grows in popularity and encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times, especially when dealing with a significant number of requests. Caching LLM responses can significantly increase response times, and lower the cost of using generative AI. However, to match the input of an LLM to previous input requires performing a semantic match rather than an exact match. Vector search provides users with that ability. (code)
A brief overview of transformers—and their challenges
To gain deeper insight into the value of vector search in the domain of generative AI, it’s important to understand what transformers are and what their limitations are. Transformers are designed to understand the context and semantics of language by taking in a sequence of tokens (words, or parts of words) and outputting a corresponding sequence. These models pay attention to each input token and the relationships between them, using a mechanism known as self-attention or scaled dot-product attention. This enables them to understand complex linguistic constructs and generate coherent and contextually accurate responses.
Despite their capabilities, transformers face a significant challenge: the token limit. This constraint arises due to memory limitations in the computational hardware. With large input sequences, the self-attention mechanism requires storing each token's relationship with every other token, which can quickly exhaust available memory. As a result, transformers like GPT-3 or GPT-4 have a limit on the number of tokens they can process in a single pass; it’s typically a few thousand tokens.
Vector search solves this problem by retrieving the most semantically relevant data so that we can get the most value possible from the limited token window. Take for example a Q&A chatbot for a software product. Instead of passing the Q&A chatbot the entire Q&A repository for a software product, or using term-based search that can easily retrieve unrelated information, vector search enables you to selectively query for semantically relevant content to use the token limit effectively. (code)
Another example is chat history. LLMs use chat history to give context to what the user has discussed with the LLM in the past. Simply using a fixed number of the last interactions to prevent going over the token limit would result in LLMs forgetting the previous context of the conversations. Vector search allows for only relevant historical conversations to be passed to the LLM (code).
Beyond the technical constraint, there's a cost aspect as well. Each token processed by the transformer uses computational resources, which directly translates into financial cost. Therefore, when using transformers in your applications, it's important to maximize the value extracted from each token within the given limit. Vector search reduces the cost aspect by first optimizing the content that is populated in the prompt, then also intelligently caching its responses.(code)
CassIO: integrating vector search into your generative AI app
To accelerate the integration of vector search into your app, we’ve also created a library called CassIO. This software framework integrates seamlessly with popular LLM software such as LangChain, making it easy to leverage vector search in your database. CassIO can maintain chat history, create prompt templates, and cache LLM responses. To learn more, check out the CassIO website.
Get started!
Cassandra and Astra DB developers can take a great leap forward with vector search. Today, Cassandra is the number one database for querying both structured and unstructured data. Understanding the strengths and limitations of transformers and using advanced data management technologies like DataStax can greatly enhance the effectiveness and cost-efficiency of generative AI applications. Try out a preview of vector search by signing up now, and register for our June 15 webinar on vector search.
Want to go deeper with vector search, LLMs and GenAI? Join us on July 11, 2023, for a free virtual GenAI summit for architects and practitioners: Agent X: Architecture for GenAI. In two hours, we'll unpack and demonstrate how you can craft inspiring AI agents and GenAI experiences with your unique datasets.