Glossary
What is Retrieval-Augmented Generation (RAG)?
Last updated: 2026-05-05
Definition
Retrieval-Augmented Generation (RAG) is a technique that grounds a large language model in a specific corpus of documents at query time. Instead of relying only on what the model learned during training, RAG retrieves relevant passages from your data and adds them to the prompt — letting the model answer using your knowledge, current and proprietary.
Why Retrieval-Augmented Generation matters
RAG was introduced by Lewis et al. in 2020 (Facebook AI Research) and has since become the dominant pattern for grounding generative AI in enterprise data. According to a 2025 Stanford study, RAG-augmented LLMs reduce factual hallucination on internal-data questions by 40-60% compared to base models, while keeping inference cost roughly equivalent.
How Retrieval-Augmented Generation works
- 1Ingest your documents (PDFs, web pages, internal wikis) and split them into chunks of typically 200-1000 tokens.
- 2Convert each chunk into a numerical embedding (a vector) using an embedding model.
- 3Store the embeddings in a vector database (Pinecone, Qdrant, pgvector, Postgres + pgvector).
- 4When a user asks a question, embed the question and search the vector database for the most similar chunks (top-k retrieval).
- 5Construct a prompt that includes the retrieved chunks as context, then send it to the LLM.
- 6The LLM answers using the provided context, ideally citing which chunk it drew from.
Examples
- A support agent grounded on your help-center articles, so it can answer customer questions accurately.
- A legal-research agent grounded on the EU AI Act regulation text, so it can cite specific articles when explaining obligations.
- A sales-research agent grounded on your CRM notes, so it can summarize the last six months of customer interactions.
References
Related concepts
AI agent
An AI agent is a software program that uses a large language model (LLM) to autonomously plan and complete a task, combining reasoning, tool use, and memory. Unlike a one-shot prompt, an agent can break a goal into steps, call external tools or APIs, and decide what to do next based on intermediate results.
AI agent platform
An AI agent platform is software that lets organizations build, deploy, govern, and monitor AI agents at scale — typically with a workspace UI, multi-LLM access, knowledge bases, integrations, scheduling, and audit logging. The platform replaces the need for each team to assemble agent infrastructure from raw frameworks.
Multi-agent orchestration
Multi-agent orchestration is the practice of chaining multiple specialized AI agents into a single workflow, where each agent has a defined role (researcher, writer, reviewer, publisher) and outputs flow from one agent to the next. The orchestrator decides the order, handles retries, and enforces guardrails between steps.
Model Context Protocol (MCP)
Model Context Protocol (MCP) is an open standard introduced by Anthropic in 2024 that defines how AI agents connect to external data sources and tools. MCP servers expose data and capabilities; MCP clients (LLMs and agent platforms) discover and call them through a uniform interface — eliminating per-tool custom integration code.
FAQ
Retrieval-Augmented Generation — common questions
- Does RAG eliminate AI hallucinations?
- No, but it reduces them substantially on questions about your own data. The model can still hallucinate if the retrieved chunks do not contain the answer or if the model misreads them. Pair RAG with citation requirements ("cite the chunk you used") and with evaluation loops to catch failures.
- How is RAG different from fine-tuning?
- Fine-tuning bakes new knowledge into the model weights — slow, expensive, hard to update. RAG keeps knowledge external in a vector database — fast to update, easy to audit, and the model can cite which document it used. RAG is the default for most enterprise grounding use cases.
- What is the right chunk size for RAG?
- There is no universal answer. 200-500 tokens works for short Q&A; 800-1500 tokens works for longer contextual answers. Use overlap (typically 10-20%) so important context is not lost at chunk boundaries. Test against your own retrieval-quality benchmarks.
- Is RAG GDPR compliant?
- RAG is a technique, not a compliance regime. GDPR compliance depends on what data you index. AgentWorks applies PII redaction at the gateway layer before any chunk reaches a third-party LLM, supports EU data residency on all knowledge bases, and logs every retrieval for audit.