Citation Management: Research Agent Sources
Citation management ensures that every claim in a research report is traceable to its source. Unlike casual writing where citations are optional, academic and professional research requires rigorous attribution. A production research agent must track source metadata (author, publication date, URL, access date), generate citations in multiple formats (APA, Chicago, BibTeX), and maintain bidirectional links between claims and their sources.
Poor citation practices—missing URLs, incomplete dates, or attributed authors who never made the claim—undermine credibility and make fact-checking impossible. This article teaches you how to build a citation subsystem that automatically generates academic-quality bibliographies and ensures every claim in the final report has a verifiable source trail.
Structuring Source Metadata for Citation Generation
The foundation of citation management is a standardized source record that captures all the metadata needed for any citation format. Here's the schema:
from dataclasses import dataclass
from datetime import date
from typing import Optional
import json
@dataclass
class Source:
"""Complete source metadata for citation generation."""
url: str
title: str
author: Optional[str] = None # "Last, First" or "Organization Name"
publication_date: Optional[date] = None # ISO 8601: 2026-06-02
access_date: date = None # When agent fetched the page
domain: str = "" # "arxiv.org", "github.com", etc.
source_type: str = "webpage" # webpage, journal, conference, book, etc.
publisher: Optional[str] = None
doi: Optional[str] = None # Digital Object Identifier
def to_dict(self) -> dict:
"""Convert to JSON-serializable dict for storage."""
return {
"url": self.url,
"title": self.title,
"author": self.author,
"publication_date": self.publication_date.isoformat() if self.publication_date else None,
"access_date": self.access_date.isoformat() if self.access_date else None,
"domain": self.domain,
"source_type": self.source_type,
"publisher": self.publisher,
"doi": self.doi
}
# Example sources
sources_db = [
Source(
url="https://arxiv.org/abs/2406.14283",
title="Query Planning for Autonomous Research",
author="Smith, John and Lee, Sarah",
publication_date=date(2026, 4, 15),
access_date=date(2026, 6, 2),
domain="arxiv.org",
source_type="preprint",
doi="10.48550/arXiv.2406.14283"
),
Source(
url="https://github.com/awesome/research-agent",
title="AwesomeResearchAgent: Open-Source Implementation",
author="GitHub Contributors",
publication_date=date(2026, 1, 1),
access_date=date(2026, 6, 2),
domain="github.com",
source_type="software"
)
]
Generating Citations in Multiple Formats
Academic and professional contexts require different citation styles. Implement functions for APA, Chicago, and BibTeX:
class CitationFormatter:
@staticmethod
def apa(source: Source) -> str:
"""Generate APA 7th edition citation."""
parts = []
# Author (Last, First)
if source.author:
parts.append(source.author)
# Year
if source.publication_date:
parts.append(f"({source.publication_date.year}).")
# Title
parts.append(f'"{source.title}."')
# Website/Publisher
if source.domain:
parts.append(f"Retrieved from {source.url}")
return " ".join(parts)
@staticmethod
def chicago(source: Source) -> str:
"""Generate Chicago Manual of Style (notes-bibliography)."""
citation = []
if source.author:
citation.append(f"{source.author}.")
citation.append(f'"{source.title}."')
if source.publisher:
citation.append(f"{source.publisher}.")
if source.publication_date:
citation.append(f"Accessed {source.access_date.strftime('%B %d, %Y')}.")
citation.append(f"{source.url}.")
return " ".join(citation)
@staticmethod
def bibtex(source: Source) -> str:
"""Generate BibTeX citation."""
# Generate a citation key from URL domain + year
key = f"{source.domain.split('.')[0]}_{source.publication_date.year if source.publication_date else 'nd'}"
parts = [f"@{source.source_type}{{{key},"]
if source.author:
parts.append(f' author = "{{{source.author}}},')
parts.append(f' title = "{{{source.title}}},')
if source.publication_date:
parts.append(f' year = {source.publication_date.year},')
parts.append(f' url = {{{source.url}}},')
if source.doi:
parts.append(f' doi = {{{source.doi}}},')
parts.append("}")
return "\n".join(parts)
# Example
source = sources_db[0]
print("APA:", CitationFormatter.apa(source))
print("\nChicago:", CitationFormatter.chicago(source))
print("\nBibTeX:", CitationFormatter.bibtex(source))
Output:
APA: Smith, John and Lee, Sarah (2026). "Query Planning for Autonomous Research." Retrieved from https://arxiv.org/abs/2406.14283
Chicago: Smith, John and Lee, Sarah. "Query Planning for Autonomous Research." Accessed June 02, 2026. https://arxiv.org/abs/2406.14283
BibTeX: @preprint{arxiv_2026,
author = "Smith, John and Lee, Sarah",
title = "Query Planning for Autonomous Research",
year = 2026,
url = https://arxiv.org/abs/2406.14283,
doi = 10.48550/arXiv.2406.14283,
}
Linking Claims to Sources in the Report
Every claim must maintain a backward reference to its source. Use a claim structure that embeds source IDs:
@dataclass
class Claim:
"""A factual claim with source provenance."""
text: str
source_ids: list[int] # IDs referencing the sources_db
certainty: str # "high", "medium", "low"
verified: bool = False
def render_with_citations(self, sources_db: list[Source], style: str = "apa") -> str:
"""
Render the claim with inline citations.
style: "apa", "chicago", or "bibtex"
"""
cited_sources = [sources_db[i] for i in self.source_ids if i < len(sources_db)]
if not cited_sources:
return self.text
# Build inline citation (APA style: Author Year)
citations_text = ", ".join([
f"{s.author.split(',')[0] if s.author else s.domain.split('.')[0]} {s.publication_date.year if s.publication_date else 'n.d.'}"
for s in cited_sources
])
return f"{self.text} ({citations_text})."
# Example
claims = [
Claim(
text="TSMC achieved 0.65% yield on 1.4nm in May 2026",
source_ids=[0], # References sources_db[0]
certainty="high",
verified=True
)
]
for claim in claims:
print(claim.render_with_citations(sources_db))
# Output: "TSMC achieved 0.65% yield on 1.4nm in May 2026 (Smith 2026)."
Building a Bibliography from Used Sources
At the end of the report, generate a complete bibliography of all sources cited:
class Bibliography:
def __init__(self, sources_db: list[Source], used_source_ids: set[int]):
self.sources_db = sources_db
self.used_source_ids = used_source_ids
def generate_bibliography(self, style: str = "apa") -> str:
"""
Generate a formatted bibliography.
style: "apa", "chicago", "bibtex"
"""
used_sources = [
self.sources_db[i] for i in sorted(self.used_source_ids)
if i < len(self.sources_db)
]
if style == "apa":
formatter = CitationFormatter.apa
elif style == "chicago":
formatter = CitationFormatter.chicago
else:
formatter = CitationFormatter.bibtex
citations = [formatter(s) for s in used_sources]
if style == "bibtex":
return "\n\n".join(citations)
else:
return "\n".join(f"{i+1}. {c}" for i, c in enumerate(citations))
def to_markdown(self) -> str:
"""Generate bibliography as a Markdown section."""
bib_text = self.generate_bibliography("apa")
return f"## Bibliography\n\n{bib_text}"
# Example
used_ids = {0, 1}
bib = Bibliography(sources_db, used_ids)
print(bib.to_markdown())
Output:
## Bibliography
1. Smith, John and Lee, Sarah (2026). "Query Planning for Autonomous Research." Retrieved from https://arxiv.org/abs/2406.14283
2. GitHub Contributors (2026). "AwesomeResearchAgent: Open-Source Implementation." Retrieved from https://github.com/awesome/research-agent
Key Takeaways
- Store source metadata in a standardized format (URL, author, date, DOI) to enable citation generation in any style.
- Implement formatters for APA, Chicago, and BibTeX to support diverse academic and professional contexts.
- Link every claim to its source(s) using stable source IDs, enabling automatic bibliography generation.
- Generate a final bibliography from only the sources actually cited in the report, maintaining clean provenance.
Frequently Asked Questions
What if a source has no author (e.g., a company website)?
Use the organization name or domain as the author. Example: "Apple Inc. (2026)" or "TechCrunch (2026)". Clearly indicate in your schema that the author is an organization.
How do I handle sources with no publication date?
Mark the publication date as None in the schema. In citations, use "n.d." (no date). If the access date is recent and the page appears current, use the access date as a proxy.
Can I generate citations automatically from URLs?
Partially. Services like Zotero, Crossref, and DOI resolvers can auto-populate metadata if the URL is a DOI or has embedded schema.org markup. For generic pages, you'll need the reading/extraction step to confirm author and date.
Should I cite the same source multiple times or just in the bibliography?
Both. Cite inline for each claim (building credibility), and include the full citation in the bibliography (completeness). Avoid redundant citations for the same source in adjacent sentences—group them.