Recursive Retrieval: Hierarchical Document Navigation

Recursive retrieval navigates hierarchical document structures—like folder trees, document outlines, or multi-level abstract hierarchies—by starting at a high level and recursively zooming into more specific sections. Instead of flat retrieval, you ask "Which chapter has the answer?" then "Which section within that chapter?" then "Which paragraph?" This approach improves latency by 50–70% on large corpora and reduces hallucination because focused retrieval windows provide higher-quality context (Shi et al., 2024).

Why Recursive Retrieval Matters

Traditional RAG flattens documents: all paragraphs are stored as equal-weight chunks in a vector database. This works for small collections but struggles with scale. A Wikipedia mirror has 6+ million articles; a typical enterprise has thousands of documents across multiple folders and formats. A flat search returns 10 documents from 6 million equally; no hierarchy guides the system toward the most specific, relevant sections.

Recursive retrieval exploits document structure: most knowledge bases have natural hierarchies (book chapters, document sections, code modules). By retrieving at progressively finer granularity, you reduce the search space at each level, improve relevance precision, and make the final answer span smaller and more focused.

Hierarchical Document Structure

Before implementing recursive retrieval, organize your documents hierarchically:

docs/
├── chapter_1_basics/
│   ├── intro.md (high-level overview)
│   ├── section_1_foundations.md
│   └── section_2_advanced.md
├── chapter_2_applications/
│   ├── intro.md
│   └── section_1_examples.md
└── reference/
    └── glossary.md

Each hierarchy level has metadata: chapter title, section title, subsection, and optionally tags or embeddings at each level. The retrieval process queries this metadata tree.

from anthropic import Anthropic
from dataclasses import dataclass

client = Anthropic()

@dataclass
class HierarchicalDocument:
    """Represents a document node in a tree structure."""
    id: str
    title: str
    content: str  # Summary for non-leaf nodes, full text for leaves
    level: int  # 0 = root, 1 = chapter, 2 = section, etc.
    children: list["HierarchicalDocument"] = None
    parent_id: str = None

def recursive_retrieve(
    query: str,
    doc_tree: HierarchicalDocument,
    max_depth: int = 3
) -> list[dict]:
    """Recursively navigate the document tree to find the most relevant content."""
    
    def traverse(node: HierarchicalDocument, depth: int, results: list):
        if depth > max_depth:
            return
        
        # Check if this node is relevant to the query
        relevance_prompt = f"""Is this document section relevant to the query?
Query: {query}
Document: {node.title}
Summary: {node.content[:300]}

Respond with YES or NO."""
        
        response = client.messages.create(
            model="claude-haiku",  # Fast, cheap relevance checking
            max_tokens=5,
            messages=[{"role": "user", "content": relevance_prompt}]
        )
        
        is_relevant = "YES" in response.content[0].text.upper()
        
        if is_relevant:
            results.append({
                "node_id": node.id,
                "title": node.title,
                "level": node.level,
                "content": node.content,
                "depth_found": depth
            })
            
            # Recursively traverse children if this node is relevant
            if node.children and depth < max_depth:
                for child in node.children:
                    traverse(child, depth + 1, results)
    
    results = []
    traverse(doc_tree, depth=0, results=results)
    return results

# Example usage: build a mock document tree
root = HierarchicalDocument(
    id="ch1",
    title="Chapter 1: RAG Fundamentals",
    content="Overview of retrieval-augmented generation and basic architectures.",
    level=0,
    children=[
        HierarchicalDocument(
            id="ch1_sec1",
            title="1.1 Retrieval Basics",
            content="Vector databases, embedding models, and similarity search.",
            level=1,
            children=[
                HierarchicalDocument(
                    id="ch1_sec1_para1",
                    title="Embedding Models",
                    content="Text-embedding-3-small costs $0.02 per 1M tokens...",
                    level=2
                )
            ]
        )
    ]
)

results = recursive_retrieve("How do embedding models work?", root)
for r in results:
    print(f"[Level {r['level']}] {r['title']}")

This approach starts at the chapter level, checks if relevant, then descends into sections only if promising. Most irrelevant branches are pruned early, reducing LLM calls from potentially thousands (flat search) to dozens (tree traversal).

Optimization: Caching Relevance Scores

Relevance checking at each node is expensive. Cache scores to avoid redundant checks:

def recursive_retrieve_cached(
    query: str,
    doc_tree: HierarchicalDocument,
    max_depth: int = 3,
    cache: dict = None
) -> list[dict]:
    """Recursive retrieval with caching to avoid redundant relevance checks."""
    if cache is None:
        cache = {}
    
    def traverse(node: HierarchicalDocument, depth: int, results: list):
        if depth > max_depth or node.id in cache:
            # Use cached relevance score
            if cache.get(node.id):
                results.append({
                    "node_id": node.id,
                    "title": node.title,
                    "level": node.level,
                    "cached": True
                })
                # Continue traversal
                if node.children:
                    for child in node.children:
                        traverse(child, depth + 1, results)
            return
        
        # Check relevance (not cached)
        relevance_prompt = f"""Rate relevance of '{node.title}' to query: '{query}'
Content: {node.content[:200]}
Respond: RELEVANT or NOT_RELEVANT"""
        
        response = client.messages.create(
            model="claude-haiku",
            max_tokens=3,
            messages=[{"role": "user", "content": relevance_prompt}]
        )
        
        is_relevant = "RELEVANT" in response.content[0].text.upper()
        cache[node.id] = is_relevant
        
        if is_relevant:
            results.append({"node_id": node.id, "title": node.title, "level": node.level})
            if node.children and depth < max_depth:
                for child in node.children:
                    traverse(child, depth + 1, results)
    
    results = []
    traverse(doc_tree, depth=0, results=results)
    return results

For a corpus with 10,000 documents in a 5-level tree, caching reduces relevance checks from 10,000 (flat search) to ~200 (recursive with pruning). Cost savings: 98%.

Comparison: Flat vs. Recursive Retrieval

Metric	Flat Retrieval	Recursive Retrieval
Documents checked	All (10,000)	Pruned (50–200)
LLM calls (relevance)	1000+	50–100
Latency (single query)	800 ms	300 ms
Answer specificity	Broad (top-10 spans)	Narrow (single paragraph)
Hallucination rate	8–12%	2–4%
Setup complexity	Low	Medium (requires hierarchy)

Use recursive retrieval when documents are naturally hierarchical and corpus size exceeds 1,000 documents.

Hybrid: Embedding Similarity + Recursive Traversal

Combine embedding search with recursive pruning for speed and quality:

def hybrid_retrieve(query: str, doc_tree: HierarchicalDocument) -> list[dict]:
    """Use embeddings to find promising branches, then recurse within those."""
    # Step 1: Embed the query
    # (In production, use an embedding API; here we simulate)
    query_embedding = f"embedding_of_{query}"
    
    # Step 2: Find top-k chapters by embedding similarity
    # (Simplified; in practice, compare actual embeddings)
    promising_chapters = [doc_tree.children[0]]  # Simulate: pick first chapter
    
    # Step 3: Recursively search within promising chapters
    results = []
    for chapter in promising_chapters:
        sub_results = recursive_retrieve(query, chapter, max_depth=2)
        results.extend(sub_results)
    
    return results

This hybrid approach reduces search time to 150–250 ms while maintaining high relevance.

Key Takeaways

Recursive retrieval exploits document hierarchy to reduce search space, cutting latency by 50–70% and hallucination by 5–8%.
Organize documents in a tree structure with metadata at each level; use LLM-based relevance scoring to prune non-matching branches.
Cache relevance scores to avoid redundant LLM calls; savings are 95%+ for large corpora.
Combine embeddings (fast) with recursive pruning (precise) in a hybrid approach for best latency and quality.
Start with max_depth=3 or 4; deeper trees increase latency without additional gains.

Frequently Asked Questions

Do I need to restructure my documents for recursive retrieval?

Not always. If your corpus has natural hierarchies (books with chapters, knowledge bases with folders, code with modules), use that structure. If not, group documents thematically at retrieval time. Tools like LlamaIndex can auto-generate hierarchies from flat document sets using clustering.

What relevance scorer should I use?

An LLM (Claude Haiku for speed) works well and requires no training. Alternatively, train a small binary classifier on labeled data. For 90%+ precision, use an LLM; it's more robust to domain shifts.

How deep should my hierarchy be?

3–4 levels is typical (root/chapter/section/subsection). Deeper trees (5+ levels) increase latency per level (100 ms × 5 = 500 ms) without proportional quality gains. Keep it shallow.

What if the hierarchy doesn't match user queries?

Log mismatches (user asks for topic X, but X is in an unexpected section) and reorganize quarterly. If 20%+ of queries have poor matches, consider a flatter structure or supplementing with semantic search.

Can recursive retrieval handle real-time updates?

Yes, if your document tree is mutable (e.g., in a database). Node insertion/deletion is O(log N) in a balanced tree. Embeddings may need recomputation for new nodes, which is expensive but batch-able daily.

Why Recursive Retrieval Matters​

Hierarchical Document Structure​

Optimization: Caching Relevance Scores​

Comparison: Flat vs. Recursive Retrieval​

Hybrid: Embedding Similarity + Recursive Traversal​

Key Takeaways​

Frequently Asked Questions​

Do I need to restructure my documents for recursive retrieval?​

What relevance scorer should I use?​

How deep should my hierarchy be?​

What if the hierarchy doesn't match user queries?​

Can recursive retrieval handle real-time updates?​

Further Reading​