← All insights
TechnicalMay 26, 20265 min read

Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

Share
Article cover placeholder

TL;DR

Chunking patterns for enterprise RAG by document type: structure-aware for markup, semantic for prose, hierarchical for long structured documents, table and code preservation, and per-type rules. Plus the evaluation that catches bad chunking.

Chunking Strategies for Enterprise RAG: Where Quality Lives and Dies

Retrieval-Augmented Generation is "just" three steps: chunk documents, embed and index, retrieve and generate. The chunk step looks like the simplest. It is not. Chunking decisions made on day one set a ceiling on RAG quality that no amount of embedding upgrade or prompt engineering can fully lift. Most RAG quality complaints we see at AgentWorks trace back to chunking that was wrong for the document type.

This is the practical guide: what chunking does, where it goes wrong, and the patterns that work for the document types enterprises actually run.

What chunking actually decides

The chunking step decides:

  • The unit of retrieval (what gets returned when a query matches)
  • The context window the model receives (one chunk, several chunks, or many chunks)
  • The granularity of citations (you can only cite what you indexed as a chunk)
  • The coupling between related pieces of information (split too aggressively, related info gets separated)

Every downstream stage depends on these decisions. Chunking is not pre-processing; it is information architecture.

The patterns that produce bad chunks

Fixed-size character chunks: split documents into 500- or 1000-character chunks regardless of content structure. Easy to implement, broken for any document with headings or sections. A chunk that ends mid-sentence loses meaning. A heading separated from its content makes both useless.

Naive sentence splitting: split on sentence boundaries. Better than character-based for prose but terrible for technical content (code blocks split mid-block, table rows orphaned).

No overlap: each chunk is independent with no overlap to neighbouring chunks. Queries that match content at chunk boundaries miss the context on either side.

Single chunking strategy for all document types: applying the same chunking rules to email, code, PDFs with tables, and structured policy documents. Each needs different handling.

The patterns that work

Structure-aware chunking

For documents with markup (Markdown, HTML, structured PDF), chunk along the document's natural structural boundaries:

  • Markdown documents: chunk at heading boundaries, with parent headings prepended to each chunk for context. A chunk for "Refund Policy → International Orders" includes both headings so the model knows the scope.
  • HTML content: chunk at section, article, or h2/h3 boundaries. Strip navigation and footer content before chunking.
  • PDFs with structure: extract the document outline (table of contents, heading hierarchy) and chunk at appropriate levels. For technical PDFs, often h2 or h3 level.

Semantic chunking for prose

For prose without clear structural markers (research papers, customer correspondence, long-form articles), use semantic chunking:

  • Split into semantic units using embedding-distance-based boundaries
  • A new chunk starts when the topic shifts significantly
  • Typical chunk size: 300-800 tokens, with topic-driven boundaries rather than fixed length

Implementation: compute embeddings of sentences or short sliding windows, detect topic shifts via cosine distance, chunk at shift points.

Hierarchical chunking

For long structured documents (policy manuals, technical documentation, regulations) where both fine and coarse retrieval matter:

  • Index multiple chunk sizes from the same document
  • Small chunks (200-500 tokens) for precise retrieval
  • Medium chunks (1,000-2,000 tokens) for context
  • Whole-document or whole-section summaries for navigation

The retrieval step picks the right chunk size based on query type. Specific factual queries match small chunks; "tell me about" queries match medium chunks.

Table and code preservation

For documents with tables and code:

  • Tables: keep the whole table as one chunk, with the table caption and the column headers preserved. Splitting tables across chunks destroys their meaning.
  • Code blocks: keep the whole code block as one chunk, with surrounding prose context as metadata. Code split mid-function is useless.

This requires the chunking pipeline to recognise tables and code blocks before splitting.

Per-document-type rules

The honest pattern in production: chunking is a function of document type, not a global setting. For an enterprise knowledge base mixing:

  • Policy PDFs: heading-based with overlap
  • Knowledge base articles (Markdown): heading-based with parent headings prepended
  • Customer support transcripts: semantic chunking on conversation turns
  • Code repositories: function-level or class-level boundaries
  • Email threads: per-email with thread context as metadata
  • Spreadsheets: per-sheet with column headers preserved

A single chunking strategy across this mix produces uneven retrieval quality. Per-type rules are more work to maintain but materially better in practice.

The metadata that matters

Each chunk should carry metadata that makes retrieval smarter:

  • Source identifier: which document, which version, which section
  • Document type: for type-specific retrieval boosting
  • Access controls: which users or roles can see this chunk
  • Created and modified timestamps: for freshness-aware retrieval
  • Parent context: for documents with hierarchical structure
  • Custom domain tags: regulatory area, product line, customer segment

The metadata serves both retrieval (filter, rank, control access) and citation (the answer points to the source document with the right precision).

Overlap, and how much

Chunk overlap reduces the risk of relevant content falling on a boundary. Typical overlap:

  • 10-15% for structure-aware chunking (the structure already provides natural boundaries)
  • 15-25% for semantic chunking (semantic boundaries are imperfect)
  • 0% for table and code chunks (you do not want partial tables overlapping)

Too much overlap wastes index space and produces duplicate retrieval results. Too little misses relevant context at boundaries.

Evaluation: how you know your chunking is working

Chunking quality is best measured by retrieval quality, not by chunk inspection. Build an evaluation set:

  • 50-200 representative queries that real users ask
  • For each, the gold-standard answer and the source document(s)
  • Run retrieval, measure recall@k (does the right source appear in top-k?) and precision (are top results relevant?)
  • Inspect failures to understand whether they are chunking, embedding, or query-side problems

When you change chunking strategy, re-run the evaluation. If recall@5 drops below baseline, the new strategy is worse. If recall@5 holds and precision improves, the new strategy is better.

The common questions

"How big should my chunks be?" Wrong question. Ask "what is the right chunking strategy for each document type." Size emerges from the strategy.

"Should I use semantic chunking everywhere?" No. Semantic chunking is great for prose, wasteful for structured documents that already have structural boundaries.

"How do I handle very long documents?" Hierarchical chunking with multi-level indexes. Retrieve at the right level for the query.

"What about images and diagrams?" Extract text from images (OCR, image captioning models), index the extracted content with a reference back to the original image. For documents where images carry primary content (technical drawings, infographics) consider multimodal retrieval.

What AgentWorks ships

The knowledge-rag features in AgentWorks ship with structure-aware chunking as the default, semantic chunking as an option for prose-heavy content, and per-document-type configuration for production knowledge bases. The metadata model supports the per-chunk access controls and citation linking that production deployments need. For teams without ML engineering capacity to tune chunking strategies, the defaults work for typical knowledge bases; for teams with specific document types or quality requirements, the configuration surface lets you tune per type.

The lesson from production: spend the first week of any RAG project on chunking analysis, not on prompt engineering. The ceiling RAG quality you can ever reach is set by chunking choices made at the start.

About the author

· Founder, AgentWorks

Erwin Berkouwer is the founder of AgentWorks — an AI agent platform purpose-built for European teams that need EU AI Act-ready governance, multi-LLM choice across OpenAI, Anthropic, Google and Mistral, and transparent per-token € pricing.

Read more about Erwin