Glossary

What is Chunking strategy?

Last updated: 2026-05-26

Definition

A chunking strategy is the algorithm a RAG pipeline uses to split source documents into the pieces ("chunks") it will later embed, store, and retrieve. Chunking is the single biggest determinant of RAG quality: poor chunking puts the right answer in the wrong slice and no embedding model can recover it. There is no universal optimum — strategy must match document type, query pattern, and embedding model context window.

Why Chunking strategy matters

Practitioners spend more time tuning chunking than any other RAG knob. Recent benchmarks show 20-40% retrieval accuracy swing between naive fixed-size chunking and structure-aware chunking on the same corpus. For regulated content (contracts, policies, medical records) chunking errors can leak personal data across context boundaries or split a clause from its definition — both are compliance issues.

How Chunking strategy works

  1. 1Fixed-size chunking: split every N tokens (typical 256-1024) with M-token overlap (typical 10-20%). Fast, dumb, often good enough.
  2. 2Sentence- or paragraph-aware: split at natural prose boundaries; better for narrative text but breaks on tables, code, lists.
  3. 3Structure-aware: parse the document (markdown headings, HTML sections, PDF outline) and chunk per logical unit. Best for technical docs and contracts.
  4. 4Semantic chunking: use embedding similarity to detect topic shifts and split at them. Slow to build, strong for long-form essays.
  5. 5Hybrid retrieval: store the same content at multiple chunk sizes ("parent-child") and retrieve the small chunk first, then expand to its parent for context.

Examples

  • A policy library uses heading-aware chunking: every H2 becomes a chunk, and the H1 + breadcrumb is prepended so each chunk knows which policy it belongs to.
  • A codebase RAG splits per function (tree-sitter parse), not per line — so when a developer asks "how does login() work?" the retrieval returns the full function.
  • A customer-support archive uses semantic chunking on conversation transcripts to keep each "topic shift" as its own chunk for higher-precision retrieval.

References

FAQ

Chunking strategy — common questions

What is the optimal chunk size?
There is no universal optimum. Starting point: 512 tokens with 10% overlap for prose, structure-aware for technical docs, function-level for code. Iterate based on actual retrieval evals against your query set.
Should chunks contain metadata?
Yes — at minimum: source URL, section path, last_modified, access level. This lets retrieval filter ("only this customer's docs"), let answers cite sources, and lets compliance audit which document version was used.
Do I need to re-chunk when I switch embedding models?
Usually yes. Different embedding models have different optimal chunk sizes (driven by context length and training distribution). Always re-evaluate retrieval quality after an embedding-model swap.