TechnologySeptember 11, 2024

Now in LangChain: Graph Vector Store, a Simple Way to Add Structured Data to RAG Applications

Now in LangChain: Graph Vector Store, a Simple Way to Add Structured Data to RAG Applications

We’re excited to show off Graph Vector Store, a new feature we recently developed in collaboration with the LangChain team that improves the relevance of retrieval-augmented generation (RAG) applications. Graph Vector Store combines the strengths of both semantic similarity and knowledge graphs to retrieve more accurate and complete context.

The problem with vector similarity

Vector similarity is a cornerstone in many RAG applications, enabling systems to find documents that are semantically similar to a given query. However, vector similarity has its limitations. In many cases, semantic similarity is insufficient to retrieve the correct documents, and additional types of information are needed.

For example, a document might include a hyperlink to another document describing a related topic that’s relevant to understanding the first document. Since the two documents describe different topics, they may be “far apart” in vector space.

This can lead to situations where a vector search misses one of the documents simply because it’s not close enough to the query vector, despite the document author’s clear indication the documents are related to each other. Vector similarity based on semantic embedding models turns out to be very powerful—but isn’t a silver bullet.

The rise of knowledge graphs

Recently, there’s been growing interest in knowledge graphs, which represent information as a network of entities and their interconnections. Knowledge graphs excel at capturing different kinds of relationships between different pieces of information, making it easier to find relevant data through their connections. However, building and maintaining a comprehensive knowledge graph from scratch can be a daunting task, and the conversion from unstructured content to knowledge graphs often leaves out important information and nuance. 

A simpler solution: Graph overlay on vector databases

Rather than building a knowledge graph from scratch, we realized there was a simpler and more efficient solution. By overlaying graph connections onto your existing vector database, you can harness the benefits of both vector similarity and knowledge graph connectivity. This hybrid approach enables your application to find documents related to your search and to follow the graph connections to find additional, missing context.

Graph connections might incorporate many kinds of relationships; we’ve already contributed tools to create relationships based on HTML hyperlinks, document structure, topic keywords, and named entities, with more on the way.

Best of all, this new type of document retrieval doesn’t require a specialized graph database. In many cases, graph connections can be saved in your existing vector store. 

The benefits of Graph Vector Store

Graph Vector Store is a drop-in enhancement to traditional RAG, meaning it can be easily integrated into your existing system. By combining the strengths of semantic similarity and knowledge graphs, Graph Vector Store offers a more robust solution for data retrieval, ensuring that relevant information is not overlooked.

Example

To illustrate how Graph Vector Store works, let's walk through a simple example. Suppose you're building a Q&A application and you want to leverage Graph Vector Store to improve the relevance of the answers retrieved.

First, you need to initialize the Graph Vector Store with your existing vector database:

from langchain_community.graph_vectorstores 
import CassandraGraphVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

graph_vector_store = CassandraGraphVectorStore(embeddings)

Next, you can add graph connections between documents. For instance, you can connect documents related to the same keyword by linking each document to the keywords it contains:

 

from langchain_community.graph_vectorstores.extractors import 
LinkExtractorTransformer, KeybertLinkExtractor
from langchain_community.document_loaders import PyPDFLoader

pages = PyPDFLoader("Tourbook.pdf").load()

pipeline = LinkExtractorTransformer([KeybertLinkExtractor()])
pipeline.transform_documents(pages)

graph_vector_store.add_documents(pages)

When a query is issued, Graph Vector Store uses the hybrid graph/vector retrieval algorithm to find the most relevant documents. Here's how you can search for a query:

graph_vector_store.traversal_search(
query="To be, or not to be?", 
k=10, depth=3
)

By following the graph connections, Graph Vector Store can retrieve answers that might be semantically distant in vector space but are logically connected through the graph.

Try it out!

Graph Vector Store provides a powerful tool for improving the relevance of retrieved information. By combining vector similarity with the connectivity of knowledge graphs, it ensures that your application can find the most accurate and relevant answers. 

Check out the LangChain documentation to learn more about Graph Vector Store and try it out yourself!

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.