Corrective RAG (CRAG): Adaptive Fallback Retrieval

Corrective RAG (CRAG) is a framework that grades the quality of retrieved documents and escalates to alternative retrieval strategies when quality is low. If an initial retrieval returns low-relevance documents, CRAG triggers fallback mechanisms like broader search, web search, or knowledge distillation before generating a response. This approach improves accuracy by 25–35% on questions where traditional retrieval fails, while maintaining speed by avoiding unnecessary escalation when initial retrieval succeeds (Yan et al., 2023).

The CRAG Pipeline

CRAG operates in three stages:

Retrieval quality assessment: Score retrieved documents for relevance and coverage.
Fallback decision: If quality is low, escalate to alternative retrieval methods.
Knowledge distillation: Generate synthetic documents from lower-quality sources if alternatives also fail.

This tiered approach ensures answers are generated from high-quality sources when available, but gracefully degrades to lower-quality sources or synthesis rather than hallucinating.

Document Quality Assessment

Assess retrieved documents before generation:

from anthropic import Anthropic
import json

client = Anthropic()

def assess_retrieval_quality(query: str, documents: list[str]) -> dict:
    """Assess the quality of retrieved documents."""
    assessment_prompt = """Evaluate the quality of these retrieved documents for answering the query.

Query: {query}

Documents:
{docs_text}

For each document, score on:
- Relevance (0-100): Does it directly address the query?
- Comprehensiveness (0-100): Does it cover the main aspects?
- Authority (0-100): Does it come from a credible source?
- Recency (0-100): Is it current? (100 if N/A)

Return JSON: {{
  "overall_quality": 0-100,
  "quality_level": "HIGH|MEDIUM|LOW",
  "per_document": [
    {{"relevance": X, "comprehensiveness": Y, "authority": Z, "recency": W}}
  ],
  "missing_aspects": ["aspect1", "aspect2"],
  "recommendation": "USE|REFINE|ESCALATE"
}}""".format(
        query=query,
        docs_text="\n---\n".join([f"Doc {i}: {d[:300]}" for i, d in enumerate(documents)])
    )
    
    response = client.messages.create(
        model="claude-opus-4-1",
        max_tokens=400,
        messages=[{"role": "user", "content": assessment_prompt}]
    )
    
    text = response.content[0].text
    start = text.find('{')
    end = text.rfind('}') + 1
    return json.loads(text[start:end])

# Example
docs = [
    "Q3 2025 Microsoft revenue was $67.2B (source: investor.microsoft.com, June 2025)",
    "Cloud computing growth accelerated in 2024..."
]
quality = assess_retrieval_quality("Microsoft Q3 2025 earnings", docs)
print(f"Quality Level: {quality['quality_level']}")
print(f"Recommendation: {quality['recommendation']}")
if quality['recommendation'] == "ESCALATE":
    print(f"Missing aspects: {quality['missing_aspects']}")

Define quality thresholds: HIGH (score >75), MEDIUM (50–75), LOW (<50). CRAG escalates only if quality is LOW or MEDIUM with missing critical aspects.

Fallback Retrieval Strategies

If initial retrieval quality is low, escalate to alternatives:

def fallback_retrieval(query: str, initial_docs: list[str], 
                       retriever_fn, fallback_level: int = 1) -> tuple[list[str], str]:
    """Escalate retrieval through fallback strategies."""
    
    fallback_strategies = [
        {
            "name": "Broader search",
            "description": "Remove query filters; search broader knowledge base",
            "params": {"filter_strict": False, "top_k": 20}
        },
        {
            "name": "Web search",
            "description": "Search the web for current information",
            "params": {"search_type": "web", "top_k": 10}
        },
        {
            "name": "Knowledge distillation",
            "description": "Synthesize answer from lower-quality sources",
            "params": {"synthesis": True}
        }
    ]
    
    if fallback_level >= len(fallback_strategies):
        # Exhausted all strategies; return initial docs with warning
        return initial_docs, "LOW_QUALITY_DOCUMENTS"
    
    strategy = fallback_strategies[fallback_level]
    
    print(f"Escalating to: {strategy['name']}")
    print(f"  {strategy['description']}")
    
    # Execute fallback retrieval
    if strategy['params'].get('search_type') == 'web':
        # In production, use a real web search API (Bing, Google)
        fallback_docs = retriever_fn(query, **strategy['params'])
    elif strategy['params'].get('synthesis'):
        # Knowledge distillation: synthesize from heterogeneous sources
        fallback_docs = synthesize_documents(query, initial_docs)
    else:
        fallback_docs = retriever_fn(query, **strategy['params'])
    
    # Re-assess quality
    quality = assess_retrieval_quality(query, fallback_docs)
    if quality['quality_level'] in ['HIGH', 'MEDIUM']:
        return fallback_docs, f"ESCALATED_TO_{strategy['name'].upper()}"
    else:
        # Try next strategy
        return fallback_retrieval(query, fallback_docs, retriever_fn, fallback_level + 1)

def synthesize_documents(query: str, documents: list[str]) -> list[str]:
    """Generate synthetic documents using knowledge distillation."""
    synthesis_prompt = """Using these documents and general knowledge, 
synthesize a comprehensive answer to the query.

Query: {query}

Documents:
{docs_text}

Generate a synthetic document (200-300 words) that synthesizes 
the key information and fills gaps from your training knowledge.""".format(
        query=query,
        docs_text="\n---\n".join(documents[:3])
    )
    
    response = client.messages.create(
        model="claude-opus-4-1",
        max_tokens=300,
        messages=[{"role": "user", "content": synthesis_prompt}]
    )
    
    return [response.content[0].text]

def mock_retriever(query: str, **kwargs) -> list[str]:
    """Mock retriever for demonstration."""
    if kwargs.get('search_type') == 'web':
        return ["Web search result about " + query]
    else:
        return ["Broader search result about " + query]

# Example
initial = ["Document A (low relevance)"]
corrected, strategy = fallback_retrieval("Q3 2025 earnings by industry", mock_retriever, initial)
print(f"Strategy used: {strategy}")

Comparison: Retrieval with and without CRAG

Scenario	Standard RAG	CRAG
High-quality initial retrieval	Answer correct (1 hop, 200 ms)	Answer correct (1 hop, 250 ms)
Medium-quality retrieval	Answer incomplete (60–70% accuracy)	Escalate + correct answer (85–90% accuracy)
Low-quality retrieval	Hallucination (30–40% accuracy)	Escalate → synthesis → correct answer (80–85% accuracy)
Average latency (all queries)	250 ms	300–400 ms (5–50% slower)
Hallucination rate	8–12%	2–4%

CRAG adds 100–150 ms of latency on average but reduces hallucination and improves accuracy on hard questions by 30–50%.

Knowledge Distillation for Synthetic Documents

When retrieval fails completely, generate synthetic documents:

def knowledge_distillation(query: str, documents: list[str], 
                          model_type: str = "distillation") -> str:
    """Distill knowledge from mixed sources into a synthetic document."""
    
    if model_type == "distillation":
        prompt = """Generate a synthetic reference document that would answer this query.
        
Query: {query}

Supporting sources (may be partial or low-quality):
{docs}

Synthesize a complete, coherent document that:
1. Answers the query directly
2. Incorporates accurate information from the sources
3. Fills gaps using your training knowledge
4. Acknowledges any uncertainty or missing information

Format as if from an authoritative source (e.g., official docs, research paper).""".format(
            query=query,
            docs="\n---\n".join(documents[:2])
        )
    else:
        # Alternative: summarization-based distillation
        prompt = f"""Summarize and synthesize these documents into a unified answer for: {query}"""
    
    response = client.messages.create(
        model="claude-opus-4-1",
        max_tokens=400,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

Synthetic documents are lower quality than real sources but prevent complete failure. Use them as a last resort.

Integration: Complete CRAG Pipeline

def corrective_rag(query: str, retriever_fn, threshold: float = 0.6) -> dict:
    """Full CRAG pipeline."""
    
    # Step 1: Initial retrieval
    documents = retriever_fn(query)
    
    # Step 2: Assess quality
    quality = assess_retrieval_quality(query, documents)
    
    # Step 3: Fallback if necessary
    if quality['overall_quality'] / 100 < threshold:
        documents, fallback_used = fallback_retrieval(
            query, documents, retriever_fn
        )
    else:
        fallback_used = "NONE"
    
    # Step 4: Generate response
    context = "\n---\n".join(documents[:3])
    generation_prompt = f"""Based on these documents:
{context}

Answer: {query}"""
    
    response = client.messages.create(
        model="claude-opus-4-1",
        max_tokens=300,
        messages=[{"role": "user", "content": generation_prompt}]
    ).content[0].text
    
    return {
        "query": query,
        "initial_quality": quality['overall_quality'],
        "fallback_used": fallback_used,
        "final_documents": len(documents),
        "response": response
    }

# Test
result = corrective_rag("Explain quantum computing", mock_retriever)
print(f"Quality: {result['initial_quality']}/100")
print(f"Fallback: {result['fallback_used']}")

Key Takeaways

CRAG assesses retrieval quality and escalates to fallback strategies (broader search, web search, synthesis) when initial retrieval is insufficient.
Use three-level fallback: broader search, web search, knowledge distillation. Most queries succeed at level 1–2.
Knowledge distillation synthesizes documents when real retrieval fails, preventing hallucination at the cost of lower confidence.
CRAG adds 50–150 ms latency but reduces hallucination from 8–12% to 2–4% and improves hard-question accuracy by 30–50%.

Frequently Asked Questions

What quality threshold should I use?

Start with 60–70%. Monitor user feedback: if 10%+ of answers are still unsatisfactory, lower to 50%. If 90%+ are excellent but latency is high, raise to 75–80%. Quality scores should be calibrated on your domain (50 medical papers is very different from 50 social media posts).

How expensive is CRAG compared to standard RAG?

Cost scales with fallback usage. If 70% of queries succeed at level 1 (standard retrieval), 25% at level 2 (web search), and 5% at level 3 (synthesis), average cost increases ~1.5–2x. Web search costs $0.001–0.002 per call; synthesis costs $0.003–0.005.

Can I use CRAG without a grading LLM?

Yes. Train a small classifier on 100–200 labeled examples (relevant/irrelevant) and use that instead of LLM grading. Classifier inference is 100x faster (5 ms vs 500 ms) but less flexible to domain shifts. For production, hybrid: classifier for speed, LLM for edge cases.

What if multiple fallback strategies return poor results?

This indicates your knowledge base is insufficient for the query. Options: (1) Log the query and prioritize content acquisition for that topic; (2) Return a honest response: "I don't have enough information to answer this. Here's what I found..."; (3) For web-searchable queries, always offer a web search link.

How do I prevent knowledge distillation from hallucinating?

Knowledge distillation works best when you have at least one relevant document. If retrieval is completely empty, don't synthesize—return an error. In code: if not documents: return "No information available" instead of synthesizing from nothing.

The CRAG Pipeline​

Document Quality Assessment​

Fallback Retrieval Strategies​

Comparison: Retrieval with and without CRAG​

Knowledge Distillation for Synthetic Documents​

Integration: Complete CRAG Pipeline​

Key Takeaways​

Frequently Asked Questions​

What quality threshold should I use?​

How expensive is CRAG compared to standard RAG?​

Can I use CRAG without a grading LLM?​

What if multiple fallback strategies return poor results?​

How do I prevent knowledge distillation from hallucinating?​

Further Reading​