Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

Retrieval-Augmented Generation is "just" three steps: chunk documents, embed and index, retrieve and generate. The chunk step looks like the simplest. It is not. Chunking decisions made on day one set a ceiling on RAG quality that no amount of embedding upgrade or prompt engineering can fully lift. Most RAG quality complaints we see at AgentWorks trace back to chunking that was wrong for the document type.

This is the practical guide: what chunking does, where it goes wrong, and the patterns that work for the document types enterprises actually run.

What chunking actually decides

The chunking step decides:

The unit of retrieval (what gets returned when a query matches)
The context window the model receives (one chunk, several chunks, or many chunks)
The granularity of citations (you can only cite what you indexed as a chunk)
The coupling between related pieces of information (split too aggressively, related info gets separated)

Every downstream stage depends on these decisions. Chunking is not pre-processing; it is information architecture.

The patterns that produce bad chunks

Fixed-size character chunks: split documents into 500- or 1000-character chunks regardless of content structure. Easy to implement, broken for any document with headings or sections. A chunk that ends mid-sentence loses meaning. A heading separated from its content makes both useless.

Naive sentence splitting: split on sentence boundaries. Better than character-based for prose but terrible for technical content (code blocks split mid-block, table rows orphaned).

No overlap: each chunk is independent with no overlap to neighbouring chunks. Queries that match content at chunk boundaries miss the context on either side.

Single chunking strategy for all document types: applying the same chunking rules to email, code, PDFs with tables, and structured policy documents. Each needs different handling.

The patterns that work

Structure-aware chunking

For documents with markup (Markdown, HTML, structured PDF), chunk along the document's natural structural boundaries:

Markdown documents: chunk at heading boundaries, with parent headings prepended to each chunk for context. A chunk for "Refund Policy → International Orders" includes both headings so the model knows the scope.
HTML content: chunk at section, article, or h2/h3 boundaries. Strip navigation and footer content before chunking.
PDFs with structure: extract the document outline (table of contents, heading hierarchy) and chunk at appropriate levels. For technical PDFs, often h2 or h3 level.

Semantic chunking for prose

For prose without clear structural markers (research papers, customer correspondence, long-form articles), use semantic chunking:

Split into semantic units using embedding-distance-based boundaries
A new chunk starts when the topic shifts significantly
Typical chunk size: 300-800 tokens, with topic-driven boundaries rather than fixed length

Implementation: compute embeddings of sentences or short sliding windows, detect topic shifts via cosine distance, chunk at shift points.

Hierarchical chunking

For long structured documents (policy manuals, technical documentation, regulations) where both fine and coarse retrieval matter:

Index multiple chunk sizes from the same document
Small chunks (200-500 tokens) for precise retrieval
Medium chunks (1,000-2,000 tokens) for context
Whole-document or whole-section summaries for navigation

The retrieval step picks the right chunk size based on query type. Specific factual queries match small chunks; "tell me about" queries match medium chunks.

Table and code preservation

For documents with tables and code:

Tables: keep the whole table as one chunk, with the table caption and the column headers preserved. Splitting tables across chunks destroys their meaning.
Code blocks: keep the whole code block as one chunk, with surrounding prose context as metadata. Code split mid-function is useless.

This requires the chunking pipeline to recognise tables and code blocks before splitting.

Per-document-type rules

The honest pattern in production: chunking is a function of document type, not a global setting. For an enterprise knowledge base mixing:

Policy PDFs: heading-based with overlap
Knowledge base articles (Markdown): heading-based with parent headings prepended
Customer support transcripts: semantic chunking on conversation turns
Code repositories: function-level or class-level boundaries
Email threads: per-email with thread context as metadata
Spreadsheets: per-sheet with column headers preserved

A single chunking strategy across this mix produces uneven retrieval quality. Per-type rules are more work to maintain but materially better in practice.

The metadata that matters

Each chunk should carry metadata that makes retrieval smarter:

Source identifier: which document, which version, which section
Document type: for type-specific retrieval boosting
Access controls: which users or roles can see this chunk
Created and modified timestamps: for freshness-aware retrieval
Parent context: for documents with hierarchical structure
Custom domain tags: regulatory area, product line, customer segment

The metadata serves both retrieval (filter, rank, control access) and citation (the answer points to the source document with the right precision).

Overlap, and how much

Chunk overlap reduces the risk of relevant content falling on a boundary. Typical overlap:

10-15% for structure-aware chunking (the structure already provides natural boundaries)
15-25% for semantic chunking (semantic boundaries are imperfect)
0% for table and code chunks (you do not want partial tables overlapping)

Too much overlap wastes index space and produces duplicate retrieval results. Too little misses relevant context at boundaries.

Evaluation: how you know your chunking is working

Chunking quality is best measured by retrieval quality, not by chunk inspection. Build an evaluation set:

50-200 representative queries that real users ask
For each, the gold-standard answer and the source document(s)
Run retrieval, measure recall@k (does the right source appear in top-k?) and precision (are top results relevant?)
Inspect failures to understand whether they are chunking, embedding, or query-side problems

When you change chunking strategy, re-run the evaluation. If recall@5 drops below baseline, the new strategy is worse. If recall@5 holds and precision improves, the new strategy is better.

The common questions

"How big should my chunks be?" Wrong question. Ask "what is the right chunking strategy for each document type." Size emerges from the strategy.

"Should I use semantic chunking everywhere?" No. Semantic chunking is great for prose, wasteful for structured documents that already have structural boundaries.

"How do I handle very long documents?" Hierarchical chunking with multi-level indexes. Retrieve at the right level for the query.

"What about images and diagrams?" Extract text from images (OCR, image captioning models), index the extracted content with a reference back to the original image. For documents where images carry primary content (technical drawings, infographics) consider multimodal retrieval.

What AgentWorks ships

The knowledge-rag features in AgentWorks ship with structure-aware chunking as the default, semantic chunking as an option for prose-heavy content, and per-document-type configuration for production knowledge bases. The metadata model supports the per-chunk access controls and citation linking that production deployments need. For teams without ML engineering capacity to tune chunking strategies, the defaults work for typical knowledge bases; for teams with specific document types or quality requirements, the configuration surface lets you tune per type.

The lesson from production: spend the first week of any RAG project on chunking analysis, not on prompt engineering. The ceiling RAG quality you can ever reach is set by chunking choices made at the start.

Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

What chunking actually decides

The patterns that produce bad chunks

The patterns that work

Structure-aware chunking

Semantic chunking for prose

Hierarchical chunking

Table and code preservation

Per-document-type rules

The metadata that matters

Overlap, and how much

Evaluation: how you know your chunking is working

The common questions

What AgentWorks ships

About the author

Agent Error Handling and Recovery Patterns: Production-Ready Resilience

Prompt Injection Defense for Production AI Agents: Layered Controls That Actually Work

Reducing LLM Latency for User-Facing Agents: The Techniques That Actually Work

Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

What chunking actually decides

The patterns that produce bad chunks

The patterns that work

Structure-aware chunking

Semantic chunking for prose

Hierarchical chunking

Table and code preservation

Per-document-type rules

The metadata that matters

Overlap, and how much

Evaluation: how you know your chunking is working

The common questions

What AgentWorks ships

About the author

Related articles

Agent Error Handling and Recovery Patterns: Production-Ready Resilience

Prompt Injection Defense for Production AI Agents: Layered Controls That Actually Work

Reducing LLM Latency for User-Facing Agents: The Techniques That Actually Work