TechnologyAugust 14, 2024

Generative AI 101: What Is Retrieval-Augmented Generation?

Generative AI 101: What Is Retrieval-Augmented Generation?

If you’re a developer, you might have heard of retrieval-augmented generation, or RAG. You also might not feel particularly grounded in what it does or how it might improve your generative AI application. 

You're not alone. Very smart developers often ask us about RAG and why it’s important to GenAI apps. How does it make the output of large language models (LLMs) more relevant and accurate (and as a consequence, make your GenAI apps better)? 

In a nutshell, RAG is a technique that app developers can use to help provide context from outside the LLM (from a database or an API, for example) that enables the LLM to produce better, more relevant responses. 

We recently held a livestream to help answer some of the questions out there about RAG; watch the replay below:

We get into some detailed demonstrations and answer some thoughtful and important questions from attendees in the one-hour replay, but here’s a quick overview of what we covered.

A broad definition of RAG

One helpful way to think about RAG is to look at its individual elements backwards. “Generation” is what you get when working with an LLM with no tailoring or prompt engineering. You might ask it a question like “generate an image,” and it does just that, and the response might—or might not—be what you had hoped for. 

“Augmentation” is all about adding some more detailed instructions for the LLM. What are the boundaries it should adhere to? How should it or shouldn’t respond?

Then the “retrieval” part is about fetching information from somewhere else, like a database or other sources. When you put the three parts together, you get better, more accurate results from your prompts.  

LLMs aren’t perfect

LLMs are one of the biggest technical innovations in the past couple years—but they aren’t perfect. They’re trained offline on massive amounts of publicly available data by very smart people. But that training stops at some point. If you want to build apps that can access data that an LLM wasn’t trained on, you’re out of luck.

Let’s say you wanted to build an intelligent assistant or adviser built on your own proprietary data – or real-time data that’s being produced in the moment. Even the best LLMs don’t have access to that. RAG is the way that data is introduced to the LLM, enabling more relevant and accurate responses.

Get the details

For a side-by-side comparison of an LLM query with and without RAG, a demonstration of Langflow, the open-source, drag-and-drop, low-code interface for building RAG pipelines, and a lively Q&A session, please watch the livestream replay.

And for a more detailed discussion of RAG, check out the DataStax guide, “Retrieval-Augmented Generation Explained: Understanding Key Concepts.”

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.