RAG is like giving your LLM a personalized knowledge base. Instead of relying solely on pre-trained knowledge, RAG systems fetch relevant information from your data in real-time. This means more accurate, up-to-date responses and fewer hallucinations.
RAG-enabled pipelines are a cost-effective approach to developing LLM applications that provide current, relevant information. RAG pipelines can also handle diverse use cases by accessing different knowledge bases, from internal documents to external sources. This flexibility makes them suitable for a wide variety of business applications.
Building a RAG-enabled pipeline
A simple RAG system consists of five stages: text embedding model, LLM, information retrieval, ranking, and response generation. All of these steps rely on vector databases, so that's where we’ll start.
With these in place, we can now use Langflow to create a RAG-enabled pipeline. Sign into Langflow and choose the "Vector Store RAG" template:
Data preparation
The foundation of any RAG system is good data. Before we can start to ask our LLM about our documents, we need to load our documents into the vector database we made earlier.
Zoom into the template and find the "Load Data" flow. Then, upload your file(s) into the "File" block. For demonstration here, we are using the 10-K annual report for AAPL (download here).
These uploaded files will then feed into a "Split Text" block which performs chunking on the documents. Chunking is a strategy to reduce the complexity and diversity of large data, dividing the data into smaller chunks based on token size.
Embedding model
Once the documents have been loaded and chunked, the next step is to load them to the vector database and save their vector representations. Since we are using Astra DB, we can remove the "OpenAI Embeddings" components.
On the "Astra DB" component, choose the database and collection created earlier, then choose "Astra Vectorize" and "NVIDIA." Then, click the "Run" button in the Astra DB component to run all the steps.
Going back to Astra DB and exploring the data, we can see the updated collection:
If you’ve made it this far, great! We’re done with the most tedious steps of setting up a RAG pipeline. In a production environment, these initial steps of loading documents and turning them into vectors is the most challenging part. Thankfully, Astra DB and Langflow make it extremely easy to get started.
User query handling
Back to Langflow, we’ll now make a few edits to the "Retriever" flow. This flow will control the behavior of execution when a user asks a query. Our objective is to create a pipeline that:
generates a response using large language models (LLMs) from Groq (grab an API key here)
As before, we can remove the "OpenAI Embedding" component and update the Astra DB component with the defined database and collection names. This time, however, we’ll connect the "Chat Input" edge to the "Search Input" edge. Internally, this tells the execution environment to pass whatever is asked by the user into the retrieval model, find the relevant document chunks, and provide them as the "Search Results" edge.
This edge then connects to the "Parse Data" and "Prompt" components where we can optionally perform prompt engineering. Then, we set up our LLM. If you want to use an open-source LLM, find the "Groq" component under "Models" and put it in between the "Prompt" and "Chat Output" blocks. Paste your "GROQ_API_KEY" and choose the model.
Langflow offers multiple large language models that you can swap out depending on use case.
RAG pipeline testing
To test out the pipeline, let's ask the model about the top selling products for Apple in 2024 and how it compares to 2023 numbers:
As we can see, the model provides a detailed answer from the 10-K we uploaded earlier. Congratulations, you've just built a RAG-enabled pipeline using open-source models.
Start building apps with RAG-enabled pipelines
Building a RAG system with Astra DB and an open-source tool like Langflow is a powerful, cost-effective way to create intelligent applications. While the initial setup might seem complex, the combination of Langflow's visual interface and Astra DB's managed service significantly simplifies the development process.
What is a retrieval-augmented generation (RAG) system, and why is it useful?
A RAG-enabled system combines a large language model (LLM) with a vector database to improve the accuracy of generated responses by retrieving relevant information. Instead of relying solely on the language model's context window, the system uses an embedding model to generate vector embeddings of your data.
These embeddings are stored in a vector search index, enabling efficient similarity searches when a user query is processed. The retrieved relevant documents are then used as context for the language model to produce a more accurate and contextualized response. RAG systems are especially useful for applications requiring information retrieval from large datasets.
What are the key steps to build a RAG pipeline with open-source LLMs?
Building a RAG pipeline involves the following steps:
Data preparation and ingestion: Break down raw data sources into manageable chunks and pre-process them into embeddings using an embedding model. To ensure your RAG system performs optimally, you can:
monitor vector database performance and regularly update your knowledge base
consider fine-tuning the embedding model for domain-specific applications
implement proper error-handling and rate-limiting
combine multiple specialized large language models (LLMs) to handle complex queries that a single model might struggle with
implement a two-step retrieval process to improve precision and relevance of retrieved documents
Query processing: When a user query is received, convert it into a vector and use a vector search index to retrieve the most relevant documents.
Response generation: Feed the retrieved documents as context to an open-source LLM, such as Llama3.1, for response generation.This process ensures efficient data processing and high-quality answers for a given query.
What are the benefits of using open-source tools to build a RAG-enabled system?
Open-source tools provide flexibility, cost-efficiency, and the ability to customize a RAG system for specific use cases. Using open-source large language models like Llama3.1 alongside vector storage solutions like Astra DB allows you to handle user queries and perform information retrieval without relying on proprietary APIs.
These tools enable you to build a system optimized for efficient performance, tailored to your data needs, and scalable based on your available computational resources. Plus, integrating open-source tools into a RAG pipeline allows for fine-tuning and control over retrieval processes and response generation.
Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.