Vector Search Added to Astra DB on Google Cloud for Building Real-Time GenAI Apps | DataStax

DataStax and Google Cloud Collaborate to Evolve Open Source Apache Cassandra for Generative AI workloads

On June 07, 2023


Santa Clara, Calif. – June 7, 2023DataStax, the real-time AI company, announced today that Astra DB – its popular database-as-a-service (DBaaS) built on the open source Apache Cassandra® database – now supports vector search, a key capability for letting databases provide long-term memory for AI applications using large language models (LLMs) and other AI use cases. 

Coming on the heels of the introduction of vector search into Cassandra, the availability of this new tool in the pay-as-you-go Astra DB service will enable developers to easily leverage the massively scalable Cassandra database for their LLM, AI assistant, and real-time generative AI projects. Goldman Sachs Research estimates that the generative AI software market could grow to $150 billion, compared to $685 billion for the global software industry.

DataStax is working closely with the Google Cloud AI/ML Center of Excellence as part of the Built with Google AI program to enable the best of Google Cloud’s generative AI offerings to enhance the capabilities and experience of customers using DataStax.

Vector search enables developers to search a database by context or meaning rather than keywords or literal values. This is done by using “embeddings”, for example Google Cloud’s API for text embedding, which can represent semantic concepts as vectors to search unstructured datasets such as text and images.    

Embeddings are a powerful tool that enable search in natural language across a large corpus of data, in different formats, and extract the most relevant pieces of data. 

Vector stores are required to enable extremely low latency search across databases. Altogether, embeddings, vector stores, and generative AI models like Google PaLM 2, can create powerful capabilities that dynamically combine the right information, for the right customer in their expected language. And now, because of Cassandra’s ability to search by meaning, it will play a key role in building generative AI applications.  

“Vector search is a key part of the new AI stack; every developer building for AI needs to make their data easily queryable by AI agents,” said Ed Anuff, CPO, DataStax. “Unlike many other vector databases, Astra DB is not only built for global scale and availability, but supports the most stringent enterprise-level requirements for managing sensitive data including HIPAA, PCI, and PII regulations. It’s therefore an ideal option for both startups and enterprises that manage sensitive user information and want to build impactful generative AI applications.”

DataStax is launching the new vector search tool and other new features via a NoSQL copilot – a Google Cloud Gen AI-powered chatbot that helps DataStax customers develop AI applications on Astra DB. DataStax and Google Cloud are releasing CassIO – an open source plugin to LangChain that makes it easy to combine Google Cloud’s Vertex AI service with Cassandra for caching, vector search, and chat history retrieval.

“Our customers consistently ask us for ways to tightly integrate data and AI capabilities,“ said Stephen Orban, VP of Migrations and GenAI Ecosystem at Google Cloud. “By integrating Google Cloud’s generative AI capabilities into Astra DB, DataStax is adding natural language capabilities into a suite of already powerful database capabilities, and giving customers a complete and unified data and AI solutions approach.”

“Priceline has been at the forefront of using machine learning for many years,” said Martin Brodbeck, CTO, Priceline. “Vector search gives us the ability to semantically query the billions of real-time signals we receive as part of our checkout experience that flow back to Astra DB. We plan to use Google Cloud’s generative AI capabilities alongside Astra DB’s vector search to power our real time data infrastructure and generative AI experiences." 

New Capabilities

In addition, DataStax has partnered with Google Cloud on several significant new capabilities to further Apache Cassandra and Astra DB as the database of choice for AI applications:

  • CassIO - The CassIO open source library makes it easy to add Cassandra into popular generative AI SDKs such as LangChain. Working in close partnership with Google Cloud, the new integration has several key features to enable building of sophisticated AI assistants, semantic caching for generative AI, LLM chat history, Cassandra prompt templates, and a new Google Cloud Gen AI integration. (website)
  • Google Cloud BigQuery Integration - New integration enables Google Cloud users to seamlessly import and export data from Cassandra into BigQuery straight from their Google Cloud Console to create and serve ML features in real-time. (code)
  • Google Cloud DataFlow Integration - New integration pipes real-time data to and from Cassandra for serving real-time features to ML models, integrating with other analytics systems like BigQuery, and real-time monitoring of generative AI model performance. (code)

Additionally, DataStax has entered into new IT services partnerships to help accelerate using Cassandra for building real-time generative AI applications:

  • SpringML provides advanced data science and AI services. Their close partnership with Google Cloud, and early access to the latest generative AI services from Google Cloud ensure the delivery of high quality AI applications to the largest companies in the world.

Vector search is available today as a non-production use public preview in the serverless Astra DB cloud database. It will initially be available exclusively on Google Cloud, with availability on other public clouds to follow. Developers can get started immediately, by signing up for Astra

For more details on how Cassandra is evolving to be an even more powerful database for AI and ML, read the DataStax blog post here.

###

About DataStax

DataStax is the real-time AI company. With DataStax, any enterprise can mobilize real-time data and quickly build smart, high-growth applications at unlimited scale, on any cloud. DataStax delivers the Astra DB cloud database built on Apache Cassandra® and the Astra Streaming event streaming technology built on Apache Pulsar™. Hundreds of the world’s leading enterprises, including Audi, ESL Gaming and many more rely on DataStax to unleash the power of real-time data to win new markets and change industries. Learn more at DataStax.com.

© 2023 DataStax Inc., All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. 

Apache, Apache Cassandra, and Cassandra are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States, and/or other countries.

 

Contacts

Ariel Roop

press@datastax.com

Open-Source, Scale-Out, Cloud-Native NoSQL Database

Astra DB is scale-out NoSQL built on Apache Cassandra®. Handle any workload with zero downtime and zero lock-in at global scale.