Report Synthesis: Research Agent Outputs
Report synthesis is where scattered claims from dozens of sources are woven into a coherent narrative with citations. This is the step that transforms raw research data into a polished, publishable report. Unlike query planning or fact-checking (which are mechanistic), synthesis requires judgment: deciding what's most important, how to organize ideas logically, and how to present contradictions honestly. A good research agent produces reports that are immediately useful without requiring the reader to dig through source links.
The synthesis step takes the verified claims, grouped by theme, with consensus and confidence labels, and renders them into markdown. Each major claim is cited to its sources, contradictions are presented explicitly, and the report concludes with a bibliography. This article teaches you to build a synthesis engine that transforms structured research data into professional-grade reports.
Organizing Claims into Thematic Sections
Raw claims are unordered. The synthesis step must cluster them logically. Use the original sub-questions as natural themes:
from collections import defaultdict
def organize_claims_by_theme(
claims: list[dict],
sub_questions: list[str]
) -> dict[str, list[dict]]:
"""
Cluster claims into thematic sections based on sub-questions.
Uses LLM to assign each claim to the most relevant sub-question.
"""
from anthropic import Anthropic
client = Anthropic()
organized = defaultdict(list)
system_prompt = """You are a research organizer.
Given a claim and a list of topics, choose which topic is most relevant.
Return only the topic name, nothing else."""
for claim in claims:
response = client.messages.create(
model="claude-opus-4-1",
max_tokens=50,
system=system_prompt,
messages=[{
"role": "user",
"content": f"""Claim: {claim['text']}
Topics:
{chr(10).join(f'{i+1}. {q}' for i, q in enumerate(sub_questions))}
Which topic is most relevant?"""
}]
)
response_text = response.content[0].text.strip()
# Map response back to sub-question
matched_topic = None
for sq in sub_questions:
if sq.lower() in response_text.lower() or any(
word in response_text.lower() for word in sq.lower().split()[:3]
):
matched_topic = sq
break
if not matched_topic:
matched_topic = sub_questions[0] # Default to first
organized[matched_topic].append(claim)
return dict(organized)
# Example
claims = [
{
"text": "TSMC produces 65% of world's AI chips",
"source": "Industry report 2026",
"certainty": "high"
},
{
"text": "Samsung targeting 1.5nm by Q3 2026",
"source": "Samsung press release",
"certainty": "high"
},
{
"text": "AI chip demand grew 340% year-over-year",
"source": "Market analyst",
"certainty": "medium"
}
]
sub_questions = [
"Which companies lead AI chip manufacturing?",
"What is the market size and growth?",
"What are recent technological breakthroughs?"
]
organized = organize_claims_by_theme(claims, sub_questions)
for theme, theme_claims in organized.items():
print(f"\n{theme}")
for claim in theme_claims:
print(f" - {claim['text']}")
Resolving Contradictions and Presenting Multiple Perspectives
When sources disagree, don't suppress the disagreement. Present both views with source attribution:
def resolve_and_present_contradictions(
contradictions: list[tuple[dict, dict]]
) -> list[str]:
"""
Transform contradictions into balanced prose for the report.
"""
sections = []
for claim1, claim2 in contradictions:
text = f"""**Disagreement on {claim1.get('fact')}:**
- {claim1['attributed_to']} ({claim1['year']}): {claim1['value']}
- {claim2['attributed_to']} ({claim2['year']}): {claim2['value']}
This discrepancy likely stems from different measurement methods or publication dates.
Further research is needed to reconcile these estimates."""
sections.append(text)
return sections
# Example usage in report
contradictions = [
{
"fact": "TSMC 1.4nm yield",
"attributed_to": "TSMC official",
"year": 2026,
"value": "0.65%"
},
{
"fact": "TSMC 1.4nm yield",
"attributed_to": "Analyst estimate",
"year": 2026,
"value": "0.58%"
}
]
contradiction_sections = resolve_and_present_contradictions([contradictions])
for section in contradiction_sections:
print(section)
Generating Markdown with Inline Citations
Render the organized claims into markdown with proper citations:
def render_claim_with_citation(
claim: dict,
sources_db: list,
style: str = "apa"
) -> str:
"""
Render a claim as markdown with inline citation.
"""
text = claim['text']
source_ids = claim.get('source_ids', [])
certainty = claim.get('certainty', 'medium')
# Build inline citation
if source_ids:
cited_sources = [sources_db[i] for i in source_ids if i < len(sources_db)]
authors = []
for src in cited_sources:
if src.author:
author = src.author.split(',')[0] if ',' in src.author else src.author.split()[0]
else:
author = src.domain.split('.')[0]
year = src.publication_date.year if src.publication_date else "n.d."
authors.append(f"{author} {year}")
citation = ", ".join(authors)
citation_md = f"[{citation}]({cited_sources[0]['url']})" if cited_sources else ""
else:
citation_md = "[unverified]"
# Add certainty indicator
certainty_icon = {
"high": "(well-established)",
"medium": "(supported)",
"low": "(preliminary)"
}.get(certainty, "")
return f"{text} {citation_md} {certainty_icon}"
def generate_markdown_report(
organized_claims: dict[str, list[dict]],
sources_db: list,
title: str = "Research Report",
contradictions: list = None
) -> str:
"""
Generate a complete markdown report from organized claims.
"""
lines = [
f"# {title}\n",
f"*Generated on {datetime.now().strftime('%Y-%m-%d')}*\n"
]
# Table of contents
lines.append("## Contents\n")
for i, theme in enumerate(organized_claims.keys(), 1):
lines.append(f"{i}. [{theme}](#{theme.lower().replace(' ', '-')})")
# Body sections
for theme, claims in organized_claims.items():
theme_slug = theme.lower().replace(' ', '-')
lines.append(f"\n## {theme}\n")
for claim in claims:
rendered = render_claim_with_citation(claim, sources_db)
lines.append(f"- {rendered}")
# Contradiction section
if contradictions:
lines.append("\n## Disagreements and Open Questions\n")
for contradiction in contradictions:
lines.append(contradiction)
# Bibliography
lines.append("\n## Sources\n")
used_urls = set()
for claims in organized_claims.values():
for claim in claims:
for src_id in claim.get('source_ids', []):
if src_id < len(sources_db):
used_urls.add(sources_db[src_id]['url'])
for i, url in enumerate(sorted(used_urls), 1):
lines.append(f"{i}. {url}")
return "\n".join(lines)
# Example
from datetime import datetime
organized_claims = {
"Manufacturing Leaders": [
{
"text": "TSMC dominates with 65% market share",
"source_ids": [0],
"certainty": "high"
}
],
"Technological Progress": [
{
"text": "1.4nm process yields improved 58% in 5 months",
"source_ids": [0, 1],
"certainty": "high"
}
]
}
sources_db = [
{
"url": "https://example.com/tsmc-2026",
"author": "Smith, J.",
"publication_date": type('obj', (object,), {'year': 2026})()
}
]
report = generate_markdown_report(
organized_claims,
sources_db,
title="AI Chip Manufacturing in 2026"
)
print(report)
Handling Incomplete or Sparse Research
Not every question gets answered. The synthesis step should be honest about gaps:
def identify_research_gaps(
sub_questions: list[str],
organized_claims: dict[str, list[dict]]
) -> list[str]:
"""Identify sub-questions with no supporting claims."""
addressed_themes = set(organized_claims.keys())
gaps = []
for sq in sub_questions:
if sq not in addressed_themes:
gaps.append(f"No sources found for: {sq}")
return gaps
def add_gaps_section(markdown: str, gaps: list[str]) -> str:
"""Add a 'Gaps and Limitations' section to the report."""
if not gaps:
return markdown
section = "\n## Gaps and Limitations\n\nThe following areas had insufficient coverage:\n"
for gap in gaps:
section += f"- {gap}\n"
section += "\n*Consider refining the search strategy or consulting additional sources.*\n"
return markdown + section
Key Takeaways
- Organize verified claims by theme (matching original sub-questions) to create a logical report structure.
- Present contradictions explicitly with source attribution rather than suppressing disagreement or averaging claims.
- Render each claim with an inline citation, source URL, and certainty indicator (high/medium/low) for full transparency.
- Include a bibliography and explicitly document research gaps so readers understand what was and wasn't covered.
Frequently Asked Questions
Should I use first-person ("I found") or third-person ("The research shows") in synthesis?
Use third-person academic voice: "Research shows...", "Multiple sources indicate...", "Evidence suggests...". Avoid first-person to maintain objectivity and authority.
How do I decide what order to present themes in?
Follow the logical flow that answers the original question. If the question is "How does X work?", order: definition, context, mechanisms, limitations. If it's "What should we do?", order: status quo, options, comparison, recommendation.
What if 80% of claims are from one source and 20% from others?
This is a red flag. It suggests heavy dependence on a single perspective. Call this out in the report: "Most findings derive from [Source A]. Additional independent verification would strengthen these conclusions."
Should I cite every single claim or just the key ones?
Cite every non-trivial claim. Only facts that are common knowledge (e.g., "Python is a programming language") can go uncited. When in doubt, cite.