Agentic RAG: What it is and how to use it
Retrieval augmented generation (RAG) has become the go-to technique for improving accuracy and generative AI applications. There’s a good reason for this. Compared to more expensive techniques, such as fine-tuning, RAG delivers greater accuracy with less time, expense, and overhead.
However, some argue that RAG is too static and too hard to adapt to changing circumstances. That's led to the development of a new technique —agentic RAG. In this article, we'll look at how agentic RAG works and how you can use it to build more complex RAG-enabled workflows that leverage multi-step reasoning and complex task management.
What is RAG?
Large Language Models (LLMs) excel as general-purpose for generating humanlike responses. However, their knowledge bases are limited and contain only a fraction of the information that a company or organization possesses about its own problem domain.
Enter RAG. Using RAG, companies can store information—such as product manuals, wikis, chat support logs, PDF documents, etc.— in an external data store, such as a vector database. Gen AI apps can query this information and include it as context in a prompt to an LLM. This yields more accurate and timely answers within a specific problem domain.
Over time, RAG has evolved from a naive search to using more advanced search techniques, including dense vector search and contextual re-ranking. Another approach, modular RAG, incorporates hybrid retrieval using external tools integration, such as calls to external APIs or Machine Learning pipelines. Meanwhile, graph RAG offers better results in standard vector RAG when a knowledge base consists of highly interlinked documents.
What is agentic RAG?
Agentic RAG adds autonomous AI agents to the RAG pipeline to dynamically manage retrieval strategies and adopt workflows. In comparison to standard RAG, agentic RAG uses LLMs not just to answer a user’s question, but to create a plan for answering the user’s question.
In an LLM application, agents leverage LLMs to complete common business tasks. Agents are often customer-facing. However, agentic RAG uses autonomous agents within GenAI applications themselves to create dynamic execution plans for queries based on their nature and complexity.
As described in the original paper on agentic RAG, traditional RAG pipelines are limited in their ability to respond to complex questions that require multi-step reasoning, deep contextual understanding, or access to multiple knowledge domains. Traditional RAG can also struggle to scale because, for example, it can’t intelligently determine when certain tasks can be parallelized.
Agentic RAG addresses this gap with a paradigm shift. Instead of hard-coding a GenAI app pipeline, agentic RAG agents capable of decision-making and iterative reasoning tailor their execution plans to each question. In other words, agentic RAG employs the language comprehension abilities of LLMs themselves to work with LLMs in a more dynamic fashion.
How agentic RAG works
Agentic RAG uses several capabilities of LLMs to construct dynamic GenAI pipelines:
- Reflect. Iteratively evaluate and refine its outputs, determining when a response appears inadequate and re-formulating prompts to better tailor it to the request.
- Plan: Decompose a complex request into multiple smaller tasks.
- Tools: Dynamically decide when to call external APIs and other computational resources to aid in decision-making and reasoning.
- Multi-agent: Use a chain of agents that share intermediate results based on their own specialization - e.g., leveraging a stock market agent for synthesizing real-time stock trends.
Agentic RAG strategies
An agentic RAG application can implement one or more strategies in a single app to provide adoptive responses. Some of these include:
Prompt chaining. The agent decomposes a task into multiple prompts that it runs sequentially. Responses from previous prompts are fed as inputs to subsequent prompts. While this can increase latency, it can improve accuracy by simplifying subtasks.
Routing. Subtasks are handed off to different LLMs that specialize in different subjects. This can also be used to increase cost efficiency by handing simpler questions to smaller models.
Parallelization. An agentic RAG-powered app can run known independent processes concurrently before synthesizing a response. This can reduce runtimes and greatly decrease response latency. One example is content moderation, with one model screening inputs while another creates a response.
Orchestrator-workers. Similar to parallelization, except an orchestrator dynamically dissects a workstream into subtasks and parcels each out to specialty workflows. A good example is a research question that requires searching and compiling data from multiple independent sources.
Evaluator-optimizer. This pattern uses an LLM, Machine Learning workflow, or a similar service to evaluate the correctness of a response from another LLM. It can then further refine its prompts to the original model based on this assessment. An example is translation, which can use multiple refinement cycles to distill an improved end product.
Agentic RAG architectures
Agentic RAG can also use one of several architectures, ranging from simple to incredibly dynamic. A few of these include:
Single-agent
In this model, a single agent manages all retrieval, routing, and integration tasks. This is the easiest model to use and get started with. However, it might not be able to unlock all the benefits that you could with a more complex agentic RAG architecture. It also won’t scale with complex queries, as it can't take advantage of parallel execution across multiple agents.
Multi-agent
Multi-agent architectures are a scalable evolution that can handle more complex workflows by parceling out tasks to specialized agents. This approach is modular, scalable, and brings better results through task specialization. For a comma it's more complex to coordinate, and the coordination itself adds more computational overhead.
Hierarchical agentic RAG
Hierarchical agentic RAG is a form of multi-agent RAG that uses a single agent as a master coordinator. This agent creates a plan based on the main query and hands off subtasks to specialized agents dynamically. This can deliver better decision making results than multi-agent RAG in exchange for increased computational cost.
Agentic corrective RAG
Agent corrective RAG employs an evaluator-optimizer to any of the above architectures to refine queries based on initial results. Once again, this tactic can deliver better results —however, it does so at the expense of longer processing times, as it may take several iterations to reach a desired outcome.
Adaptive agentic RAG
This model uses a query classifier to determine how complex a query is and create a plan that might involve multi-step reasoning. This is one of the most flexible strategies, as it dynamically tailors retrieval strategies based on query complexity. This results in both greater accuracy for complex questions and lower cost for answering simpler questions.
For example, a simple question about tomorrow’s weather might be answered directly by calling the appropriate weather API. By contrast, a complex question regarding population change over time in a given city may require a multi-step approach that queries and synthesizes results from multiple specialized LLMs.
Graph-based agentic RAG
This more complex approach combines one of the above approaches with structured and unstructured data sources using modular retriever banks and feedback loops. This workflow includes a critic module that validates responses for their accuracy and flags low-confidence results for reconsideration.
Agentic document workflows
An agentic document workflow is a complex document-centric process that combines document parsing and structured outputs with autonomous agents. These agents can maintain state, manage multi-step workflows, and operate on documents with domain-specific logic.
Benefits and challenges of agentic RAG
There are multiple benefits to using an agentic RAG:
- Adaptability. Agentic RAG can choose the best strategy based on a given query, resulting in higher-quality responses at lower overall cost.
- Performance. Agentic RAG can parallelize automatically where possible, decreasing user wait times and increasing customer satisfaction.
- Modularity. You can build an agentic RAG solution on top of your existing RAG solutions, augmenting them instead of replacing them.
However, agentic RAG applications come with their own set of challenges. The primary challenge is that they can require more compute time and resources than non-agentic RAG. Agentic RAG systems that make use of multiple LLMs in particular are especially susceptible to increased network latency.
It can also be challenging to build agentic RAG applications from the ground up. Once built, it can take time to verify they’re returning accurate results. If there are inaccuracies, it can be difficult to pinpoint their root cause and identify a remedy.
How to get started with agentic RAG
Despite the challenges, agentic RAG applications can unlock exciting new capabilities by creating pipelines that adapt dynamically to user's needs. By starting out small, you can gradually work up to creating a complex system capable of multi-step reasoning and decision-making.
It's easier to get started with agentic RAG using a solid AI platform that provides basic building blocks for prototyping and creating production-ready GenAI apps. LangChain, a composable framework for building LLM-powered applications, is one such platform. LangGraph, a component of LangChain, simplifies building stateful, multi-actor applications with LLMs, including agentic and multi-agent workflows.
The LangGraph documentation shows how you can use its basic building blocks to create a retriever agent that dynamically calls tools and assesses their output before formulating an answer to a user’s question.
At its root, agentic RAG is still RAG—and that means it requires an easy to use and scalable data store for domain specific information. The DataStax AI platform combines support for LangChain with Astra DB, our serverless, scalable vector database solution. Using Langflow, you can visually design complex agentic RAG applications and deploy them easily to production on the top AI-ready cloud providers, such as NVIDIA, AWS, or Google Cloud.
See how DataStax simplifies building even the most complex GenAI applications by signing up for a free account today.