Skip to main content

Contextual Chunk Retrieval: Metadata and Proximity

Contextual chunk retrieval enriches retrieved text snippets with semantic metadata, document structure, and neighboring content to improve relevance and coherence. Instead of returning isolated chunks, contextual retrieval returns chunks with their parent context (chapter title, section heading), related metadata (author, date, source type), and nearby chunks that provide continuity. This approach improves retrieval relevance by 25–30% and makes generated responses more coherent because the LLM understands the context from which chunks were extracted (Wei et al., 2023).

The Problem: Decontextualized Chunks

Traditional RAG splits documents into fixed-size chunks (e.g., 256 tokens) and embeds each independently. When a query matches a chunk, you lose the surrounding context:

Original document:

Chapter 3: Neural Network Architectures
3.1 Convolutional Neural Networks

CNNs are specialized networks for image processing. [chunk starts here]
They use convolutional filters to detect local features. Filters scan
across the image, creating feature maps. This mechanism is more efficient
than fully connected layers for image tasks. [chunk ends here]
The first CNN was LeNet, developed in 1998...

Retrieved chunk:

"They use convolutional filters to detect local features. Filters scan 
across the image, creating feature maps. This mechanism is more efficient
than fully connected layers for image tasks."

The chunk loses its parent (Chapter 3, CNNs) and neighboring information (LeNet, 1998), making the generated response feel disconnected.

Enriching Chunks with Metadata

Attach semantic metadata to each chunk:

from anthropic import Anthropic
from dataclasses import dataclass
from typing import Optional
import json

client = Anthropic()

@dataclass
class ChunkWithContext:
"""A chunk with semantic context and metadata."""
chunk_id: str
text: str
tokens: int
# Hierarchy
doc_title: str
chapter: Optional[str]
section: Optional[str]
subsection: Optional[str]
# Metadata
source_url: Optional[str]
author: Optional[str]
publish_date: Optional[str]
content_type: str # article, documentation, research, news, etc.
# Related chunks
prev_chunk_id: Optional[str]
next_chunk_id: Optional[str]

def extract_chunk_metadata(chunk_text: str, parent_context: str) -> dict:
"""Extract semantic metadata for a chunk using an LLM."""
metadata_prompt = """Analyze this text chunk and extract metadata.

Document/Page context: {parent_context}

Chunk text: {chunk}

Extract and return JSON:
{{
"key_entities": ["entity1", "entity2"],
"topics": ["topic1", "topic2"],
"summary": "One sentence summary of this chunk",
"is_definition": true/false,
"is_example": true/false,
"technical_level": "beginner|intermediate|advanced"
}}""".format(parent_context=parent_context, chunk=chunk_text[:300])

response = client.messages.create(
model="claude-haiku", # Fast metadata extraction
max_tokens=200,
messages=[{"role": "user", "content": metadata_prompt}]
)

text = response.content[0].text
start = text.find('{')
end = text.rfind('}') + 1
return json.loads(text[start:end])

# Example
chunk = "CNNs use convolutional filters to detect local features..."
parent = "Chapter 3: Neural Network Architectures, Section 3.1: CNNs"
metadata = extract_chunk_metadata(chunk, parent)
print(f"Topics: {metadata['topics']}")
print(f"Technical level: {metadata['technical_level']}")

Store metadata alongside chunks in your vector database. Modern vector databases (Weaviate, Qdrant, Pinecone) support filtering by metadata, enabling precise retrieval.

Proximity-Based Chunk Expansion

Enhance retrieved chunks by including neighboring context:

def retrieve_with_context(
query: str,
retriever_fn, # Returns list of chunk_ids
chunk_store: dict, # chunk_id -> ChunkWithContext
context_window: int = 2 # Include 2 chunks before and after
) -> list[ChunkWithContext]:
"""Retrieve chunks and expand with neighboring context."""

# Step 1: Retrieve initial chunks
matched_chunk_ids = retriever_fn(query)

# Step 2: Expand with neighboring chunks
contextual_chunks = []
seen = set()

for chunk_id in matched_chunk_ids[:5]: # Top 5 matches
if chunk_id in seen:
continue
seen.add(chunk_id)

chunk = chunk_store[chunk_id]
neighbors = []

# Gather previous chunks
prev_id = chunk.prev_chunk_id
for _ in range(context_window):
if prev_id and prev_id in chunk_store:
neighbors.insert(0, chunk_store[prev_id])
prev_id = chunk_store[prev_id].prev_chunk_id
else:
break

neighbors.append(chunk)

# Gather next chunks
next_id = chunk.next_chunk_id
for _ in range(context_window):
if next_id and next_id in chunk_store:
neighbors.append(chunk_store[next_id])
next_id = chunk_store[next_id].next_chunk_id
else:
break

contextual_chunks.extend(neighbors)

return contextual_chunks

# Example usage
class MockChunkStore:
def __getitem__(self, key):
return ChunkWithContext(
chunk_id=key,
text=f"Content of {key}",
tokens=200,
doc_title="My Document",
chapter="Chapter 1",
section="1.1",
subsection=None,
source_url="https://example.com",
author="Dr. Jane Doe",
publish_date="2025-06-01",
content_type="article",
prev_chunk_id=f"chunk_{int(key.split('_')[1]) - 1}" if int(key.split('_')[1]) > 0 else None,
next_chunk_id=f"chunk_{int(key.split('_')[1]) + 1}"
)

def mock_retriever(query: str):
return ["chunk_5", "chunk_12"] # Matched chunks

chunk_store = MockChunkStore()
results = retrieve_with_context(query="neural networks", retriever_fn=mock_retriever, chunk_store=chunk_store)
print(f"Retrieved {len(results)} chunks (with context)")

Proximity expansion typically adds 1–3 neighboring chunks, increasing context size by 30–50% while maintaining retrieval precision.

Metadata Filtering and Ranking

Filter and rank chunks by metadata:

def retrieve_with_filtering(
query: str,
retriever_fn,
chunk_store: dict,
filters: dict = None # e.g., {"content_type": "article", "min_year": 2024}
) -> list[ChunkWithContext]:
"""Retrieve chunks with metadata filtering and ranking."""

if filters is None:
filters = {}

# Step 1: Retrieve all candidates
candidates = retriever_fn(query)

# Step 2: Filter by metadata
filtered = []
for chunk_id in candidates:
chunk = chunk_store[chunk_id]

# Check filters
if filters.get("content_type") and chunk.content_type != filters["content_type"]:
continue
if filters.get("min_year"):
year = int(chunk.publish_date[:4]) if chunk.publish_date else 0
if year < filters["min_year"]:
continue

filtered.append(chunk)

# Step 3: Rank by relevance and recency
def rank_score(chunk):
relevance = 1.0 # In production, use embedding similarity
recency = 0.1 if chunk.publish_date else 0.0
authority = 0.2 if chunk.author else 0.0
return relevance + recency + authority

filtered.sort(key=rank_score, reverse=True)
return filtered[:10]

# Example: retrieve recent articles about neural networks
results = retrieve_with_filtering(
query="neural networks",
retriever_fn=mock_retriever,
chunk_store=chunk_store,
filters={"content_type": "article", "min_year": 2023}
)

Filtering reduces noise (irrelevant chunks from older sources) and improves precision by 5–10%.

Comparison: Flat Chunks vs. Contextual Chunks

AspectFlat ChunksContextual Chunks
Retrieval latency100–150 ms120–180 ms (+20%)
Context richnessIsolated snippetFull section with metadata
Generated response coherence70–75% (okay)85–90% (good)
Relevance precision75–80%85–90%
Metadata filteringNot possiblePrecise filtering
Storage size1x (chunk only)1.3–1.5x (chunk + metadata)

Contextual chunks add 20–30% to latency and storage but improve response quality 10–15%.

Building a Hierarchical Chunk Index

Structure chunks hierarchically for better context:

class HierarchicalChunkIndex:
"""Index chunks with hierarchy for context-aware retrieval."""
def __init__(self):
self.chunks = {} # chunk_id -> ChunkWithContext
self.section_to_chunks = {} # section -> [chunk_ids]
self.doc_to_sections = {} # doc -> [sections]

def add_chunk(self, chunk: ChunkWithContext):
"""Add a chunk to the index."""
self.chunks[chunk.chunk_id] = chunk

# Index by section
section_key = f"{chunk.doc_title}/{chunk.chapter}/{chunk.section}"
if section_key not in self.section_to_chunks:
self.section_to_chunks[section_key] = []
self.section_to_chunks[section_key].append(chunk.chunk_id)

def get_section_context(self, chunk_id: str) -> str:
"""Get the full section containing a chunk."""
chunk = self.chunks[chunk_id]
section_key = f"{chunk.doc_title}/{chunk.chapter}/{chunk.section}"

section_chunks = self.section_to_chunks.get(section_key, [])
texts = [self.chunks[cid].text for cid in section_chunks]
return "\n".join(texts)

def retrieve_with_section(self, query: str, matched_chunk_ids: list) -> list[str]:
"""Return chunks with full section context."""
results = []
for chunk_id in matched_chunk_ids[:5]:
chunk = self.chunks[chunk_id]
section_context = self.get_section_context(chunk_id)
results.append({
"chunk": chunk.text,
"section_title": f"{chunk.chapter} / {chunk.section}",
"section_content": section_context[:1000] # Limit to 1000 chars
})
return results

# Example
index = HierarchicalChunkIndex()
index.add_chunk(ChunkWithContext(
chunk_id="chunk_1",
text="CNN details...",
tokens=200,
doc_title="ML Guide",
chapter="Chapter 3",
section="3.1",
subsection=None,
source_url=None,
author=None,
publish_date=None,
content_type="article",
prev_chunk_id=None,
next_chunk_id="chunk_2"
))

A hierarchical index enables section-level context retrieval, improving coherence when chunks span multiple ideas.

Key Takeaways

  • Contextual chunks enhance retrieved snippets with semantic metadata (entities, topics, technical level), document hierarchy (chapter, section), and neighboring chunks.
  • Store metadata in your vector database to enable precise filtering by content type, author, date, and technical level.
  • Include 1–3 neighboring chunks to provide continuity and context; this increases storage by 30–50% but improves coherence by 10–15%.
  • Rank chunks by relevance, recency, and authority; filtering reduces noise and improves precision.
  • Hierarchical chunk indexing enables section-level retrieval and better overall context for generation.

Frequently Asked Questions

How much context is too much?

Start with 2 neighboring chunks (400–500 tokens total). Monitor the LLM's response: if it's coherent and well-grounded, keep the window; if it's verbose or diluted by off-topic neighbors, reduce to 1 chunk. For most use cases, 2–3 chunks + metadata is optimal.

Should I store metadata separately or with chunks?

Modern vector databases (Qdrant, Weaviate, Pinecone) support metadata on vectors. Store it there for efficient filtering. Legacy systems may require a separate metadata database (PostgreSQL) with a foreign key to chunk IDs; query both at retrieval time.

How do I handle chunks that span topics (e.g., transition from section A to B)?

Mark such chunks with multiple topics/sections. In code: section="2.3→3.1" or store topics=["topic_a", "topic_b"]. At retrieval time, treat multi-topic chunks as bridges; they improve coherence when queries touch multiple topics.

Can I use contextual chunks with sparse retrieval (BM25)?

Yes. Extract metadata from chunks post-retrieval. For sparse retrieval, metadata helps re-rank: BM25 retrieves candidates, then re-rank by metadata relevance (date, content type, authority). This hybrid approach works well and is cheaper than dense retrieval.

What metadata matters most?

Priority: (1) technical level (beginner/advanced determines clarity), (2) content type (research vs. news affects credibility), (3) date (recency for fast-moving topics), (4) section/hierarchy (context). Optional: author, source URL, entity tags. Start with top-3; add others based on feedback.

Further Reading