Skip to main content

Supervisor-Worker Pattern: Building Agent Hierarchies

The supervisor-worker pattern is the foundational topology for multi-agent systems: one agent (the supervisor) receives requests, decides which worker agents should handle them, distributes the work, collects results, and synthesizes a final answer. It mirrors the structure of traditional team hierarchies and is intuitive to reason about, debug, and extend. The supervisor acts as a router, load balancer, and mediator; workers are specialized agents optimized for specific domains.

This pattern excels when you have well-defined task categories and workers that rarely interact directly. A customer-support chatbot might route technical questions to a tech-support worker and billing questions to a billing worker. A content-generation system might route outline requests to a planner and detailed sections to a writer. The supervisor keeps orchestration logic centralized, making it easier to audit decisions and modify routing rules.

How the Supervisor-Worker Pattern Works

Supervisor responsibilities

The supervisor agent is responsible for:

  1. Routing (task classification): Parse the incoming request and classify it into one or more worker categories.
  2. Delegation: Invoke the appropriate worker(s) with the necessary context and parameters.
  3. Aggregation: Collect results from workers and synthesize them into a cohesive response.
  4. Error handling: Detect worker failures and either retry, escalate, or degrade gracefully.

A typical supervisor prompt includes a list of available workers and their specializations, enabling the LLM to make informed routing decisions.

Worker responsibilities

Each worker agent has a narrow, well-defined role:

  1. Accept a task: Receive a routed request from the supervisor.
  2. Perform specialized work: Execute their domain-specific logic (research, coding, validation, etc.).
  3. Return structured output: Return results in a format the supervisor expects (JSON, markdown, or plain text).
  4. Fail gracefully: Raise or log errors so the supervisor can detect them and respond.

Workers are designed to be stateless; they should not depend on context from other workers or maintain long-lived connections.

Building Your First Supervisor-Worker System

Here is a complete example implementing a technical-support orchestrator:

import anthropic
import json
from typing import Optional
from dataclasses import dataclass

client = anthropic.Anthropic()

@dataclass
class WorkerResult:
worker_name: str
success: bool
content: str

class TechSupportOrchestrator:
def __init__(self):
self.supervisor_model = "claude-3-5-sonnet-20241022"
self.worker_model = "claude-3-5-sonnet-20241022"

def supervisor_agent(self, user_query: str) -> dict:
"""Routes the user query to appropriate worker(s)."""
supervisor_prompt = """You are a technical support supervisor. Your job is to analyze customer queries
and route them to specialists.

Available workers:
- troubleshooter: Handles error diagnosis and system troubleshooting
- documenter: Provides documentation and how-to guidance
- escalator: Handles feature requests or complex issues requiring human review

Output JSON: {"primary_worker": "troubleshooter|documenter|escalator",
"secondary_worker": null or "worker_name", "reasoning": "brief explanation"}"""

response = client.messages.create(
model=self.supervisor_model,
max_tokens=500,
system=supervisor_prompt,
messages=[{"role": "user", "content": user_query}]
)

try:
return json.loads(response.content[0].text)
except json.JSONDecodeError:
return {"primary_worker": "documenter", "secondary_worker": None, "reasoning": "fallback routing"}

def troubleshooter_agent(self, query: str) -> str:
"""Diagnoses errors and provides troubleshooting steps."""
response = client.messages.create(
model=self.worker_model,
max_tokens=800,
system="""You are a technical troubleshooter. Given a user's problem, provide:
1. Root cause analysis (2-3 sentences)
2. Step-by-step troubleshooting (numbered steps)
3. Common pitfalls to avoid
Be specific and actionable.""",
messages=[{"role": "user", "content": query}]
)
return response.content[0].text

def documenter_agent(self, query: str) -> str:
"""Provides documentation and how-to guidance."""
response = client.messages.create(
model=self.worker_model,
max_tokens=800,
system="""You are a technical writer. Given a user's question, provide:
1. Clear, concise explanation of the concept
2. Relevant code examples (if applicable)
3. Links to official documentation (real URLs only)
Use markdown formatting.""",
messages=[{"role": "user", "content": query}]
)
return response.content[0].text

def escalator_agent(self, query: str) -> str:
"""Handles feature requests and complex issues."""
response = client.messages.create(
model=self.worker_model,
max_tokens=600,
system="""You are a technical escalation specialist. For feature requests or complex issues:
1. Summarize the user's need clearly
2. Explain why it may require human review
3. Suggest what information an engineer would need
Be professional and solutions-oriented.""",
messages=[{"role": "user", "content": query}]
)
return response.content[0].text

def invoke_worker(self, worker_name: str, query: str) -> WorkerResult:
"""Invoke a specific worker and return a structured result."""
try:
if worker_name == "troubleshooter":
content = self.troubleshooter_agent(query)
elif worker_name == "documenter":
content = self.documenter_agent(query)
elif worker_name == "escalator":
content = self.escalator_agent(query)
else:
return WorkerResult(worker_name, False, "Unknown worker")

return WorkerResult(worker_name, True, content)
except Exception as e:
return WorkerResult(worker_name, False, f"Error: {str(e)}")

def synthesize_results(self, routing: dict, results: list[WorkerResult]) -> str:
"""Combine worker results into a final response."""
synthesis_prompt = f"""You are synthesizing results from multiple specialist agents.

Routing decision: {routing['reasoning']}

Results from workers:
{json.dumps([{"worker": r.worker_name, "content": r.content} for r in results if r.success], indent=2)}

Synthesize these into a clear, helpful customer response. Use all relevant information from the workers."""

response = client.messages.create(
model=self.supervisor_model,
max_tokens=1000,
messages=[{"role": "user", "content": synthesis_prompt}]
)
return response.content[0].text

def handle_query(self, user_query: str) -> str:
"""End-to-end orchestration: route, execute, synthesize."""
print(f"User query: {user_query}\n")

# Step 1: Route
routing = self.supervisor_agent(user_query)
print(f"Routing: {json.dumps(routing, indent=2)}\n")

# Step 2: Invoke workers
results = []
for worker in [routing.get("primary_worker"), routing.get("secondary_worker")]:
if worker:
result = self.invoke_worker(worker, user_query)
results.append(result)
print(f"{worker} result (success={result.success}):\n{result.content[:300]}...\n")

# Step 3: Synthesize
final_response = self.synthesize_results(routing, results)
return final_response

# Example usage
if __name__ == "__main__":
orchestrator = TechSupportOrchestrator()

queries = [
"My Python script keeps timing out. What could be causing this?",
"How do I set up a virtual environment in Python?",
"I need a feature to export data to Excel. Can you help?"
]

for query in queries:
answer = orchestrator.handle_query(query)
print(f"Final response:\n{answer}\n" + "="*80 + "\n")

This implementation demonstrates:

  • Stateless workers: Each worker receives a complete query; no context from other workers.
  • Structured routing: The supervisor outputs JSON that the orchestrator can parse and act on.
  • Error handling: Worker failures are caught and reported without cascading.
  • Synthesis: Results are combined into a single response.

Load Balancing and Scaling

Static vs. dynamic routing

Static routing assigns certain query patterns to fixed workers (e.g., all billing questions always go to the billing worker). This is simple and predictable but can cause bottlenecks if one worker is slower or receives more queries.

Dynamic routing uses real-time metrics (queue depth, response latency) to adjust routing. The supervisor might send a billing query to a backup worker if the primary worker's queue is too long. This requires additional infrastructure (metrics collection, decision logic) but prevents bottlenecks.

Cascading and hierarchical workers

In some systems, a supervisor delegates to intermediate supervisors, which in turn delegate to workers. A financial-services orchestrator might have a top-level supervisor, a "compliance supervisor," and "validation workers" under the compliance supervisor. This hierarchical structure makes the system modular but adds latency.

Handling Worker Failures

Retry logic

If a worker fails, the supervisor can retry with exponential backoff. After two or three retries, escalate to a fallback worker or degrade gracefully (return a partial answer).

def invoke_worker_with_retry(self, worker_name: str, query: str, max_retries: int = 2) -> WorkerResult:
"""Invoke a worker with exponential backoff retry."""
for attempt in range(max_retries + 1):
result = self.invoke_worker(worker_name, query)
if result.success:
return result

if attempt < max_retries:
wait_time = 2 ** attempt # 1s, 2s, 4s...
print(f"Worker {worker_name} failed. Retrying in {wait_time}s...")
time.sleep(wait_time)

return result # Return the final failure

Fallback workers

Define a fallback agent for each worker so that if the primary fails, work routes to the backup. This requires careful prompt tuning to ensure fallbacks can handle the same queries.

Graceful degradation

If the supervisor cannot reach any worker for a query type, return a partial answer or a "contact support" message rather than failing entirely.

Supervisor-Worker vs. Other Patterns

AspectSupervisor-WorkerPipelineHierarchical
LatencyMedium (parallel workers)Low (sequential)High (multiple levels)
Failure isolationGood (worker failure doesn't block others)Poor (one failure stops the line)Medium (depends on structure)
ComplexityMedium (routing logic)Low (simple linear flow)High (multi-level coordination)
ScalabilityGood (add workers)Limited (inherently sequential)Moderate (add tiers)
Best forWell-defined task categoriesTransformative pipelinesComplex hierarchical problems

Key Takeaways

  • The supervisor-worker pattern centralizes routing logic in one agent, keeping workers specialized and independent.
  • Supervisors classify requests, delegate to workers, collect results, and synthesize final answers.
  • Workers are stateless, narrowly focused, and return results in agreed formats.
  • Implement retry logic and fallback workers to handle failures gracefully.
  • Use dynamic routing and intermediate supervisors to scale beyond simple two-tier hierarchies.

Frequently Asked Questions

How many workers should a supervisor manage?

Start with 3-5 workers per supervisor. Beyond 7-10, routing logic becomes fragile (the supervisor struggles to classify ambiguous queries). Use hierarchical supervisors or pipeline stages to manage more workers. Each additional worker increases the complexity of the routing prompt exponentially.

Should workers communicate directly or always through the supervisor?

In the pure supervisor-worker pattern, all communication flows through the supervisor. Direct worker-to-worker communication introduces hidden dependencies and complicates debugging. For high-performance systems, you may introduce direct communication between workers with explicit contracts, but this sacrifices some auditability.

What if a query requires two workers but they produce conflicting results?

Build a validator or judge agent into the supervisor. After receiving conflicting results, the supervisor routes them to the judge, who applies domain knowledge to pick the correct answer or synthesize a consensus. Document why the conflict occurred to improve routing.

How do I test a supervisor-worker system?

Create a test set of queries labeled with the expected worker routing. Run queries through the supervisor and log routing decisions. Compare predicted vs. actual routing. Use golden answers (correct outputs) to evaluate worker quality in isolation. Finally, test end-to-end queries.

Can I use different LLM models for the supervisor and workers?

Yes. The supervisor (which must classify queries accurately) might use Claude 3.5 Sonnet; workers might use Claude 3.5 Haiku for cost savings. The trade-off is that if workers are too weak, the supervisor cannot compensate. Test routing with both models to ensure workers can handle delegated tasks.

Further Reading