← Back to blog

Why Your RAG System Fails: The Chunking Problem

airagchunkingllmvector-databases

Chunking is how documents are broken into smaller, more manageable pieces for retrieval. Think of it as storing books by paragraphs, so when someone asks a question, you only respond with the relevant paragraphs—not the entire book.

  • Get it wrong and your RAG system doesn't find the needed information at query time
  • Get it right and you get accurate responses with lower costs and fewer hallucinations

The Fundamentals

The way you chunk your documents directly impacts retrieval quality. Too large and you waste tokens on irrelevant content. Too small and you lose context. The goal is finding the sweet spot for your specific use case.

6 Chunking Strategies

Fixed-Size Chunking

Splits text into fixed-sized chunks or tokens.

  • Use when: Most cases—this is your starting point
  • Watch out: Loss of context by breaking up ideas
  • Example: Every 500 tokens with 100-token overlap

Recursive Chunking

Splits text using the document's structure. Breaks down paragraphs, sentences, and words depending on the chunk size.

  • Use when: Documents are structured
  • Watch out: Difficult to implement across multiple document types
  • Example: Split by \n\n, then \n, then . if needed

Document-Structure Chunking

Splits text based on the document's structure using headings, paragraphs, and code blocks.

  • Use when: Markdown, HTML, structured PDFs
  • Watch out: Sections can be different sizes
  • Example: Each markdown ## heading = one chunk
  • Pro tip: Add metadata (chapter, section) to each chunk

Semantic Chunking

Splits text based on topic changes.

  • Use when: Documents have complex structures, or you need to preserve meaning
  • Watch out: Potential for context loss by splitting sentences; could have high computational cost
  • Example: Split where sentence embeddings show topic shift
  • Best for: News articles, transcripts, blog posts

Sliding Window Chunking

Overlapping chunks that move across the document.

  • Use when: Maintaining context and continuity across sections is crucial (legal, medical, or financial texts)
  • Watch out: Creates more chunks; higher storage cost
  • Example: 1000-token window, 200-token overlap
  • Prevents: Missing info that spans chunk boundaries

LLM-Based Chunking

Use an LLM to handle how to split the document.

  • Use when: Higher accuracy is needed
  • Watch out: Most expensive option
  • Example: "Divide this contract into logical sections with complete clauses"
  • Best for: Legal docs, medical records, research papers

How to Choose Your Strategy

Phase 1: Establish a Baseline

Start with fixed-size chunking at 512 tokens with 10% overlap. Build a small evaluation dataset of 20-30 real user queries and measure your baseline retrieval accuracy.

Phase 2: Test Alternatives

Test strategies that match your content:

  • Structured documents? Try document-structure or recursive chunking
  • Complex narratives? Test semantic chunking
  • Mixed content? Compare multiple approaches

A/B test against your baseline. If you see 10-20% improvement in retrieval accuracy, you've found your winner.

No improvement? Your content might work fine with fixed-size chunking, or you may need to test different chunk sizes.

Phase 3: Production Optimization

In production, different document types need different strategies:

  • Code repositories: Recursive chunking (function/class boundaries)
  • Legal contracts: Structure-based (clause preservation)
  • Blog posts: Semantic chunking (topic coherence)

The key to building a sound chunking strategy is to monitor real-world performance and adjust as needed. There really isn't a one-size-fits-all when it comes to chunking.

Quick Start: Implementation Checklist

Optimal Chunk Sizes by Content Type

  • General content: 512-1024 tokens
  • Code: 200-500 tokens (function/class level)
  • Legal/contracts: Clause-level (variable, 200-2000 tokens)
  • FAQs: One Q&A pair per chunk
  • Chat logs: Conversation turn or exchange level

Overlap Guidelines

  • Standard: 10-20% overlap (50-200 tokens)
  • Critical applications (legal, medical): 20-30%
  • Cost-sensitive: 5-10% or none

Testing Approach

  1. Create 20-50 evaluation queries from real use cases
  2. Implement baseline (fixed-size 512 tokens, 10% overlap)
  3. Measure precision (relevant chunks retrieved / total retrieved)
  4. Test 2 alternative strategies
  5. Compare metrics and choose winner
  6. Monitor in production and iterate

Tools to Get Started

  • LangChain: RecursiveCharacterTextSplitter, MarkdownTextSplitter
  • LlamaIndex: SentenceSplitter, SemanticSplitterNodeParser
  • Evaluation: RAGAS, TruLens

Strategy Comparison at a Glance

StrategySpeedAccuracyCostBest For
Fixed-sizeFastMediumLowStarting point, mixed content
RecursiveMediumHighLowStructured docs
Document-structureMediumVery HighLowMarkdown, HTML, PDFs
SemanticSlowVery HighHighUnstructured text
Sliding windowMediumVery HighMediumContext-critical apps
LLM-basedSlowHighestVery HighHigh-value docs

The Bottom Line

Chunking isn't the most talked about aspect of RAG systems, but it's the foundation that a performant RAG system is built upon. Start with fixed-size chunking, and based on real-world testing, iterate on your chunking strategy.

Your RAG system is only as good as the chunks it retrieves.