Skip to main content

Iterative Deepening: Research Agent Search Strategy

Iterative deepening is the strategy that prevents research agents from halting prematurely or searching forever. It defines how the agent decides when it has gathered enough information, when to refine its approach, and when to stop. Unlike humans who can sense when they know enough, agents must use explicit signals: coverage (Have I addressed all sub-questions?), consensus (Do multiple sources agree?), and confidence (Am I certain enough to report this finding?).

A research agent without iterative deepening either stops after the first search (missing important nuance) or loops endlessly chasing marginal improvements. This article teaches you to build a principled loop that adaptively deepens research until confidence thresholds are met, then exits cleanly with a well-supported report.

Defining Coverage, Consensus, and Confidence Metrics

The agent should track three metrics to decide if it's done:

  1. Coverage (0–1): Fraction of original sub-questions answered. If only 2 of 5 sub-questions are answered, coverage = 0.4 and the agent should continue.
  2. Consensus (0–1): Fraction of claims supported by 2+ independent sources. If 12 of 15 claims have consensus, consensus = 0.8.
  3. Confidence (0–1): Average certainty of all claims. Claims marked "high" contribute 0.95, "medium" 0.7, "low" 0.4. Average these across all claims.
from dataclasses import dataclass
from typing import Optional

@dataclass
class ResearchState:
"""Current state of the research loop."""
sub_questions: list[str]
answered_sub_questions: set[int] # Indices of answered sub-questions
claims: list[dict] # Each claim has a "certainty" field
source_consensus: dict[str, int] # claim -> count of sources citing it
iterations: int = 0
max_iterations: int = 5

def compute_coverage(state: ResearchState) -> float:
"""Fraction of sub-questions answered."""
if not state.sub_questions:
return 1.0
return len(state.answered_sub_questions) / len(state.sub_questions)

def compute_consensus(state: ResearchState) -> float:
"""Fraction of claims with 2+ source agreement."""
if not state.claims:
return 1.0

consensus_count = sum(
1 for claim in state.claims
if state.source_consensus.get(claim.get("text"), 0) >= 2
)
return consensus_count / len(state.claims)

def compute_confidence(state: ResearchState) -> float:
"""Average certainty across all claims."""
if not state.claims:
return 0.0

certainty_map = {"high": 0.95, "medium": 0.7, "low": 0.4}
certainties = [
certainty_map.get(claim.get("certainty", "medium"), 0.5)
for claim in state.claims
]

return sum(certainties) / len(certainties) if certainties else 0.0

def should_continue_research(
state: ResearchState,
min_coverage: float = 0.8,
min_consensus: float = 0.7,
min_confidence: float = 0.75
) -> tuple[bool, str]:
"""
Decide if the agent should continue researching.
Returns: (should_continue, reason)
"""

coverage = compute_coverage(state)
consensus = compute_consensus(state)
confidence = compute_confidence(state)

# Check each metric
if coverage < min_coverage:
return True, f"Coverage too low: {coverage:.2f} < {min_coverage}"

if consensus < min_consensus:
return True, f"Consensus too low: {consensus:.2f} < {min_consensus}"

if confidence < min_confidence:
return True, f"Confidence too low: {confidence:.2f} < {min_confidence}"

# All metrics met
return False, "All metrics met. Ready to synthesize report."

# Example
state = ResearchState(
sub_questions=[
"What are AI chip manufacturing techniques?",
"Which companies lead?",
"What are 2026 breakthroughs?"
],
answered_sub_questions={0, 1}, # 2 of 3 answered
claims=[
{"text": "TSMC leads 1.4nm", "certainty": "high"},
{"text": "Samsung at 1.5nm", "certainty": "medium"}
],
source_consensus={"TSMC leads 1.4nm": 3, "Samsung at 1.5nm": 1}
)

should_continue, reason = should_continue_research(state)
print(f"Continue? {should_continue}. Reason: {reason}")
# Output: "Continue? True. Reason: Coverage too low: 0.67 < 0.80"

Adaptive Query Refinement Based on Earlier Results

As the agent gathers evidence, it should refine its search strategy. If early searches return limited results, it should broaden the query. If it finds conflicting claims, it should search for reconciliation.

class QueryRefiner:
"""Adaptively refine search queries based on research progress."""

def __init__(self, client):
self.client = client

def refine_query(
self,
original_query: str,
results_so_far: list[str],
gaps: list[str]
) -> str:
"""
Refine a query based on results and identified gaps.
results_so_far: URLs/titles of results from prior searches
gaps: List of questions that remain unanswered
"""

system_prompt = """You are a search query optimization expert.
Given a search query, prior results, and unanswered questions,
refine the query to target the gaps.

Rules:
- Broaden if results are sparse (add synonym keywords)
- Narrow if results are off-topic (add specificity)
- Target gaps by incorporating unfound terms
- Keep query concise (5-10 words)

Return only the refined query string."""

prompt = f"""Original query: "{original_query}"

Prior results (first 3):
{', '.join(results_so_far[:3])}

Unanswered questions:
{'; '.join(gaps)}

Refine the query to address the gaps:"""

response = self.client.messages.create(
model="claude-opus-4-1",
max_tokens=50,
system=system_prompt,
messages=[{"role": "user", "content": prompt}]
)

return response.content[0].text.strip()

# Example
from anthropic import Anthropic
client = Anthropic()
refiner = QueryRefiner(client)

original = "AI chip manufacturing TSMC"
prior_results = ["TSMC official news", "Semi industry analysis", "TSMC investor relations"]
gaps = ["Samsung's approach", "Intel's strategy", "Manufacturing yield data"]

refined = refiner.refine_query(original, prior_results, gaps)
print(f"Refined query: {refined}")
# Possible output: "AI chip manufacturing Samsung Intel yield 2026"

Stopping Conditions and Graceful Exit

Explicitly define when to stop to avoid endless loops:

def get_stopping_reason(state: ResearchState) -> Optional[str]:
"""
Check if it's time to stop researching.
Returns stopping reason if applicable, else None.
"""

# Max iterations reached
if state.iterations >= state.max_iterations:
return f"Max iterations ({state.max_iterations}) reached"

# All metrics met (checked elsewhere; assume passed)
# This is determined by should_continue_research()

# Unexpected error pattern
if state.iterations > 0:
# Check if recent searches added no new claims
if len(state.claims) == 0:
return "No claims extracted after iteration. Stopping to avoid infinite loop."

# No more viable queries
# (This would be determined by query planner)

return None

# In the main agent loop:
def research_loop(initial_question: str, max_iterations: int = 5):
state = ResearchState(
sub_questions=decompose_question(initial_question),
answered_sub_questions=set(),
claims=[],
source_consensus={},
max_iterations=max_iterations
)

while state.iterations < max_iterations:
stopping_reason = get_stopping_reason(state)
if stopping_reason:
print(f"Stopping: {stopping_reason}")
break

should_continue, reason = should_continue_research(state)
if not should_continue:
print(f"Research complete: {reason}")
break

# Execute next research iteration (search, fetch, extract, verify)
state.iterations += 1
# ... (search, extract, verify code omitted for brevity)

return state

The Complete Iterative Loop Pseudocode

Here's how all pieces fit together:

def autonomous_research_agent(user_question: str) -> dict:
"""
Complete autonomous research loop with iterative deepening.
"""

# STEP 0: Plan
plan = decompose_query(user_question)
state = ResearchState(
sub_questions=plan["sub_questions"],
answered_sub_questions=set(),
claims=[],
source_consensus={},
max_iterations=5
)

iteration = 0
while iteration < state.max_iterations:
iteration += 1
print(f"\n=== Iteration {iteration} ===")

# STEP 1: Check stopping conditions
stopping_reason = get_stopping_reason(state)
if stopping_reason:
break

should_continue, metrics = should_continue_research(state)
if not should_continue:
print(f"Metrics met. Ready to synthesize.")
break

# STEP 2: Identify next search targets
# (search for unanswered sub-questions with low consensus)
next_targets = identify_gaps(state)

if not next_targets:
print("No more targets. Stopping.")
break

for target in next_targets[:2]: # Limit to 2 per iteration
# STEP 3: Search
results = search(target)

# STEP 4: Fetch and extract
for result in results[:3]: # Top 3 results
source_content = fetch_and_extract(result["url"])
claims = extract_claims(source_content, user_question)

# STEP 5: Verify
for claim in claims:
verify_result = verify_claim_in_source(claim, source_content)
if verify_result["verified"]:
state.claims.append(claim)
state.source_consensus[claim["text"]] = \
state.source_consensus.get(claim["text"], 0) + 1

# STEP 6: Adapt
coverage = compute_coverage(state)
consensus = compute_consensus(state)
confidence = compute_confidence(state)

print(f"Metrics: Coverage={coverage:.2f}, Consensus={consensus:.2f}, "
f"Confidence={confidence:.2f}")

# STEP 7: Synthesize and return
report = synthesize_report(state)
return report

Key Takeaways

  • Define three metrics (coverage, consensus, confidence) and set thresholds (0.8, 0.7, 0.75) to decide when research is complete.
  • Refine search queries iteratively based on gaps and prior results to broaden or narrow the search as needed.
  • Enforce explicit stopping conditions (max iterations, metrics met, no viable queries) to prevent infinite loops.
  • Track iteration count and validate that each iteration adds new information; exit if progress stalls.

Frequently Asked Questions

What should the thresholds be?

For most research: coverage = 0.8, consensus = 0.7, confidence = 0.75. For high-stakes topics, raise all to 0.9. For exploratory research, lower to 0.6. Tune based on the question's importance.

How many iterations are typical?

Most questions resolve in 2–3 iterations. Set max_iterations = 5 as a safety valve. If hitting the max consistently, the thresholds are likely too high.

What if coverage stays low because a sub-question is unanswerable?

Flag it in the report as "no sources found" and move on. Don't let impossible questions block the entire agent. Adjust coverage calculation to exclude unmatchable questions.

Should I parallelize searches across iterations?

Yes. Within an iteration, search for all gap-filling queries in parallel, then fetch and extract sequentially. This speeds up the loop without compromising citation tracking.

Further Reading