Where Did Retrieval Augmented Generation Come From—And Where Is It Going?

Retrieval-augmented generation (RAG) is a new wave in AI development. It lets you give a language model relevant documents to the use case you are building, and retrieve that information to answer queries. Research teams and organizations are finding that RAG lets them use language models to cover information beyond their training timeline and scope.

But how did RAG come about, and where is it headed? Let's take a look at the origins of the idea, how it works, and where it could lead in the future.

The origins of retrieval-augmented generation

RAG is based on the idea of grounding in evidence: if you can find generalizable evidence and give it to a language model, then it can generate based on the evidence it found.

RAG was developed by scaling up basic projects to much more general systems that weren't just answering questions; they would access a Wikipedia index, and using this external retrieval over Wikipedia, were meant to answer any question—so really open-domain question answering.

How retrieval-augmented generation works

The original idea behind the RAG implementation is an architecture where a retriever and a generative model work together and are jointly optimized to solve the problems you're interested in.

With that system in place, you can backpropagate into your model’s query encoder – you can keep all the documents fixed, but by backpropagating into the query encoder, the model can learn about the documents you want it to know about. These retrieval-augmented architectures can help optimize cost-quality tradeoffs in enterprise systems.

As a note, the metric you want to use is not necessarily similarity (a popular metric in vector databases). Rather, you might want to use relevance to whatever the input is or whatever the prompt is to your language model. To get better at that, you should optimize the retriever via backpropagation as well.

Methods in retrieval-augmented generation

Backpropagating across your system is a powerful idea in RAG. Most RAG systems don’t implement this – instead using “frozen” RAG that doesn’t adapt to the provided documentation. There are many opportunities to improve upon Frozen RAG if you allow backpropagation to the entire model.

In AI models, there are two ways to embody knowledge: explicitly, such as rows in a database, or implicitly, like in the weights of a neural network. A complete AI system needs both parts. You can't just say something; you have to ground it in something else. You want to do grounding in your AI, where the reasoning module queries the relevant information and synthesizes it into something useful

As of now, K-nearest neighbor (KNN) is still being used for retrieval algorithms, because it specifies a distance metric and finds the closest points—but new developments are occurring in representations. Over the past 10-20 years, we’ve seen incredible progress in representation learning.

The idea is to use embeddings – semantic representations – for retrieval. The future is about making sure you retrieve the right information in an optimized way and then synthesize that into generative models for dynamic AI systems.

Where retrieval-augmented generation is headed

The general trend around augmented language models is that you don't have to necessarily just augment them with a retriever. You can augment them with tools; you can augment them with computer vision pipelines.

Within that general trend, the next step is to try to optimize. A lot of the retrievers are frozen for computational reasons, but with careful distributed training, you could open up a better architecture. By updating the query encoder, you should also be able to update the document encoder as well. This can open up many possibilities for language models to generalize to all kinds of new data sources on the fly, training the model in different ways to make sure it can generalize through different modalities.

The result could be a much more flexible system where the language model is just one small part; this helps prevent your model from hallucinating. Rather than just generate language from its weights, you want your language model to defer to the things it has to rely on to do its job. As with human beings — you want to rely on facts rather than just coming up with whatever comes out of your mouth.

This potential makes multimodality a very important problem for RAG systems. Almost everything on the internet is multimodal, and so for AI to understand humans better, AI needs to interact with and learn from our multimodal data.

Conclusion

Language models are amazing first-generation technologies that will change the way we work and interact with data. There are still many hurdles to unlocking that potential – familiar problems like hallucination, attribution, compliance, data privacy, and the cost-quality trade-off. With coming solutions like RAG, language models have the potential to change the world.

DataStax’s AI platform lets you get started developing and deploying AI apps quickly, so that you can bring ideas like RAG and more to your data systems. See what DataStax can do for your AI development by trying it for free today.