Why Your RAG System Fails: The Chunking Problem

Chunking is how documents are broken into smaller, more manageable pieces for retrieval. Think of it as storing books by paragraphs, so when someone asks a question, you only respond with the relevant paragraphs—not the entire book.

Get it wrong and your RAG system doesn't find the needed information at query time
Get it right and you get accurate responses with lower costs and fewer hallucinations

The Fundamentals

The way you chunk your documents directly impacts retrieval quality. Too large and you waste tokens on irrelevant content. Too small and you lose context. The goal is finding the sweet spot for your specific use case.

6 Chunking Strategies

Fixed-Size Chunking

Splits text into fixed-sized chunks or tokens.

Use when: Most cases—this is your starting point
Watch out: Loss of context by breaking up ideas
Example: Every 500 tokens with 100-token overlap

Recursive Chunking

Splits text using the document's structure. Breaks down paragraphs, sentences, and words depending on the chunk size.

Use when: Documents are structured
Watch out: Difficult to implement across multiple document types
Example: Split by \n\n, then \n, then . if needed

Document-Structure Chunking

Splits text based on the document's structure using headings, paragraphs, and code blocks.

Use when: Markdown, HTML, structured PDFs
Watch out: Sections can be different sizes
Example: Each markdown ## heading = one chunk
Pro tip: Add metadata (chapter, section) to each chunk

Semantic Chunking

Splits text based on topic changes.

Use when: Documents have complex structures, or you need to preserve meaning
Watch out: Potential for context loss by splitting sentences; could have high computational cost
Example: Split where sentence embeddings show topic shift
Best for: News articles, transcripts, blog posts

Sliding Window Chunking

Overlapping chunks that move across the document.

Use when: Maintaining context and continuity across sections is crucial (legal, medical, or financial texts)
Watch out: Creates more chunks; higher storage cost
Example: 1000-token window, 200-token overlap
Prevents: Missing info that spans chunk boundaries

LLM-Based Chunking

Use an LLM to handle how to split the document.

Use when: Higher accuracy is needed
Watch out: Most expensive option
Example: "Divide this contract into logical sections with complete clauses"
Best for: Legal docs, medical records, research papers

How to Choose Your Strategy

Phase 1: Establish a Baseline

Start with fixed-size chunking at 512 tokens with 10% overlap. Build a small evaluation dataset of 20-30 real user queries and measure your baseline retrieval accuracy.

Phase 2: Test Alternatives

Test strategies that match your content:

Structured documents? Try document-structure or recursive chunking
Complex narratives? Test semantic chunking
Mixed content? Compare multiple approaches

A/B test against your baseline. If you see 10-20% improvement in retrieval accuracy, you've found your winner.

No improvement? Your content might work fine with fixed-size chunking, or you may need to test different chunk sizes.

Phase 3: Production Optimization

In production, different document types need different strategies:

Code repositories: Recursive chunking (function/class boundaries)
Legal contracts: Structure-based (clause preservation)
Blog posts: Semantic chunking (topic coherence)

The key to building a sound chunking strategy is to monitor real-world performance and adjust as needed. There really isn't a one-size-fits-all when it comes to chunking.

Quick Start: Implementation Checklist

Optimal Chunk Sizes by Content Type

General content: 512-1024 tokens
Code: 200-500 tokens (function/class level)
Legal/contracts: Clause-level (variable, 200-2000 tokens)
FAQs: One Q&A pair per chunk
Chat logs: Conversation turn or exchange level

Overlap Guidelines

Standard: 10-20% overlap (50-200 tokens)
Critical applications (legal, medical): 20-30%
Cost-sensitive: 5-10% or none

Testing Approach

Create 20-50 evaluation queries from real use cases
Implement baseline (fixed-size 512 tokens, 10% overlap)
Measure precision (relevant chunks retrieved / total retrieved)
Test 2 alternative strategies
Compare metrics and choose winner
Monitor in production and iterate

Tools to Get Started

LangChain: RecursiveCharacterTextSplitter, MarkdownTextSplitter
LlamaIndex: SentenceSplitter, SemanticSplitterNodeParser
Evaluation: RAGAS, TruLens

Strategy Comparison at a Glance

Strategy	Speed	Accuracy	Cost	Best For
Fixed-size	Fast	Medium	Low	Starting point, mixed content
Recursive	Medium	High	Low	Structured docs
Document-structure	Medium	Very High	Low	Markdown, HTML, PDFs
Semantic	Slow	Very High	High	Unstructured text
Sliding window	Medium	Very High	Medium	Context-critical apps
LLM-based	Slow	Highest	Very High	High-value docs

The Bottom Line

Chunking isn't the most talked about aspect of RAG systems, but it's the foundation that a performant RAG system is built upon. Start with fixed-size chunking, and based on real-world testing, iterate on your chunking strategy.

Your RAG system is only as good as the chunks it retrieves.