How RAG completes the generative AI puzzle

Retrieval-augmented generation brings to generative AI the one big thing that was holding it back in the enterprise.

Pair, combine, two puzzle pieces that fit together — tadamichi/Shutterstock

Generative AI entered the global consciousness with a bang at the close of 2022 (cue: ChatGPT), but making it work in the enterprise has amounted to little more than a series of stumbles. Shadow AI use in the enterprise is sky high as employees are making day-to-day task companions out of AI chat tools. But for the knowledge-intensive workflows that are core to an organization’s mission, generative AI has yet to deliver on its lofty promise to transform the way we work.

Don’t bet on this trough of disillusionment to last very long, however. A process called retrieval-augmented generation (RAG) is unlocking the kinds of enterprise generative AI use cases that previously were not viable. Companies such as OpenAI, Microsoft, Meta, Google, and Amazon, along with a growing number of AI startups, have been aggressively rolling out enterprise-focused RAG-based solutions.

RAG brings to generative AI the one big thing that was holding it back in the enterprise: an information retrieval model. Now, generative AI tools have a way to access relevant data that is external to the data the large language model (LLM) was trained on—and they can generate output based on that information. This enhancement sounds simple, but it’s the key that unlocks the potential of generative AI tools for enterprise use cases.

To understand why, let’s first look at the problems that occur when generative AI lacks the ability to access information outside of its training data.

The limitations of language models

Generative AI tools like ChatGPT are powered by large language models trained on vast amounts of text data, such as articles, books, and online information, in order to learn the language patterns it needs to generate coherent responses. However, even though the training data is massive, it’s just a snapshot of the world’s information captured at a specific point in time—limited in scope and without data that’s domain-specific or up to date.

An LLM generates new information based on the language patterns it learned from its training data, and in the process, it tends to invent facts that otherwise appear wholly credible. This is the “hallucination” problem with generative AI. It’s not a deal breaker for individuals using generative AI tools to help them with casual tasks throughout their day, but for enterprise workflows where accuracy is non-negotiable, the hallucination issue has been a show-stopper.

A private equity analyst can’t rely on an AI tool that fabricates supply chain entities. A legal analyst can’t rely on an AI tool that invents lawsuits. And a medical professional can’t rely on an AI tool that dreams up drug interactions. The tool provides no way to verify the accuracy of the output or use in compliance use cases because it doesn’t cite the underlying sources—it’s generating output based on language patterns.

But it’s not just hallucinations that have frustrated success with generative AI in the enterprise. LLM training data is rich in general information, but it lacks domain-specific or proprietary data, without which the tool is of little use for knowledge-intensive enterprise use cases. The supplier data the private equity analyst needs isn’t in there. Neither is the lawsuit information for the legal analyst nor the drug interaction data for the doctor.

Enterprise AI applications typically demand access to current information, and this is another area where LLMs alone can’t deliver. Their training data is static, with a cut-off date that is often many months in the past. Even if the system had access to the kind of supplier data the private equity analyst needs, it wouldn’t be of much value to her if it’s missing the last eight months of data. The legal analyst and doctor are in the same boat—even if the AI tool has access to domain-specific data, it’s of little use if it’s not up-to-date.

Enterprise requirements for generative AI

By laying out the shortcomings of generative AI in the enterprise, we’ve defined its requirements. They must be:

Comprehensive and timely, by including all relevant and up-to-date domain-specific data.
Trustworthy and transparent, by citing all sources used in the output.
Credible and accurate, by basing output on specific, trusted data sets, not LLM training data.

RAG makes it possible for generative AI tools to meet these requirements. By integrating retrieval-based models with generative models, RAG-based systems can be designed to tackle knowledge-intensive workflows where it’s necessary to extract accurate summaries and insights from large volumes of imperfect, unstructured data and present them clearly and accurately in natural language.

There are four basic steps to RAG:

Vectorization. Transform relevant information from trusted sources by converting text to a special code the system can use for categorization.
Retrieval. Use a mathematical representation to match your query to similar codes contained in the trusted information sources.
Ranking. Choose the most useful information for you by considering what you asked, who you are, and the source of the information.
Generation. Combine the most relevant parts of those documents with your question and feed it to an LLM to produce the output.

Unlike a generative AI tool that relies solely on an LLM to produce a response, RAG-based generative AI tools can produce output that is far more accurate, comprehensive, and relevant so long as the underlying data is properly sourced and vetted. In these cases, enterprise users can trust the output and use it for critical workflows.

RAG’s ability to retrieve new and updated information and cite sources is so critical that OpenAI began rolling out RAG functionality in ChatGPT. Newer search tools like Perplexity AI are making waves because the responses they generate cite their sources. However, these tools are still “general knowledge” tools that require time and investment to make them work for domain-specific enterprise use cases.

Readying them for the enterprise means sourcing and vetting the underlying data from where information is fetched to be domain-specific, customizing the retrieval, ranking the retrieval to return the documents most relevant for the use case, and fine-tuning the LLM used for generation so that the output uses the right terminology, tone, and formats.

Despite the initial flurry of excitement around generative AI, its practical application in the enterprise has so far been underwhelming. But RAG is changing the game across industries by making it possible to deliver generative AI solutions where accuracy, trustworthiness, and domain specificity are hard requirements.

Chandini Jain is the founder and CEO of Auquan, an AI innovator transforming the world’s unstructured data into actionable intelligence for financial services customers. Prior to founding Auquan, Jain spent 10 years in global finance, working as a trader at Optiver and Deutsche Bank. She is a recognized expert and speaker in the field of using AI for investment and ESG risk management. Jain holds a master’s degree in mechanical engineering/computational science from the University of Illinois at Urbana-Champaign and a B.Tech from IIT Kanpur. For more information on Auquan, visit www.auquan.com, and follow the company @auquan_ and on LinkedIn.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

Next read this: