Skip to main content

Customer support AI: Agent architecture design

The difference between a chatbot and a production support agent lies entirely in architecture. A chatbot responds to one turn; an agent maintains state, detects when to use tools, escalates to humans, and learns from each conversation. I've seen teams build chatbots in weeks that fail at hour one when scaled because they lack proper state management, tool coordination, and fallback logic. This article teaches you the architecture patterns that power tier-1 support systems: agentic loops, context windows, tool routing, and graceful degradation.

Core agent loop: The agentic pattern

A support agent runs a loop: read message, think, choose action, execute, update state, repeat. Here's the canonical pattern used by 90% of production systems in 2026:

1. Receive customer message
2. Append to conversation history
3. Call LLM with system prompt + history
4. LLM returns: response text + [tool calls]
5. If tool calls: execute them in parallel, capture results
6. Append tool results to history
7. If more tools needed: loop to step 3 (continue)
8. If no tools: return response to customer
9. Save conversation to database

This loop handles multi-step problems. A customer asks "Can you refund my purchase and cancel my account?" The agent might:

  • Call a tool to look up the order (step 4–5)
  • Call a tool to initiate refund (step 4–5)
  • Call a tool to check cancellation policy (step 4–5)
  • Synthesize results into a single response (step 8)

All within a single customer-facing turn. This is why agents feel more natural than chatbots.

State management and context windows

Your agent must remember the entire conversation and key facts about the customer. This is harder than it sounds. A typical support conversation spans 5–12 turns, generating 2,000–5,000 tokens of history. Modern models like Claude Opus (200K context) can retain this, but you must be deliberate about what you store:

from dataclasses import dataclass
from datetime import datetime

@dataclass
class ConversationState:
"""Minimal state for a support conversation."""
customer_id: str
conversation_id: str
customer_name: str
detected_intents: list[str] # from intent detection
account_info: dict # customer tier, subscription status, etc.
conversation_history: list[dict] # [{role, content, timestamp}]
tools_executed: list[dict] # for audit trail
escalation_pending: bool
escalation_reason: str = None
created_at: datetime = None
last_updated: datetime = None

class SupportAgent:
"""Single-turn agent that uses conversation state."""

def __init__(self, model: str = "claude-3-5-sonnet-20241022"):
self.model = model
self.client = Anthropic()

def run_turn(
self,
customer_message: str,
state: ConversationState,
available_tools: list[dict]
) -> tuple[str, ConversationState]:
"""
Execute one agent turn: read message, think, call tools, respond.
Returns: (response_text, updated_state)
"""

# Append customer message to history
state.conversation_history.append({
"role": "user",
"content": customer_message,
"timestamp": datetime.now().isoformat()
})

# Build context-aware system prompt
system_prompt = self._build_system_prompt(state)

# Convert history to messages format (role, content)
messages = [
{"role": h["role"], "content": h["content"]}
for h in state.conversation_history
]

# Call model with tool definitions
response = self.client.messages.create(
model=self.model,
max_tokens=2048,
system=system_prompt,
tools=available_tools,
messages=messages
)

# Process response content
assistant_text = ""
tool_calls = []

for block in response.content:
if block.type == "text":
assistant_text = block.text
elif block.type == "tool_use":
tool_calls.append({
"id": block.id,
"name": block.name,
"input": block.input
})

# Append assistant response to history
state.conversation_history.append({
"role": "assistant",
"content": assistant_text if assistant_text else "(tool use only)",
"timestamp": datetime.now().isoformat(),
"tool_calls": tool_calls
})

# Execute tools if any were called
tool_results = []
if tool_calls:
tool_results = self._execute_tools(tool_calls, state)

# Append tool results to history for agentic loop
for result in tool_results:
state.conversation_history.append({
"role": "user", # Tools return as "user" role in Anthropic API
"content": [{
"type": "tool_result",
"tool_use_id": result["tool_use_id"],
"content": result["content"]
}]
})

# Recursive: if tools were called, ask for final response
if tool_results:
final_response = self.client.messages.create(
model=self.model,
max_tokens=2048,
system=system_prompt,
messages=messages + [
{"role": "assistant", "content": [
{"type": "text", "text": assistant_text} if assistant_text else None,
*[{"type": "tool_use", **tc} for tc in tool_calls]
]},
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tr["tool_use_id"], "content": tr["content"]}
for tr in tool_results
]}
]
)

for block in final_response.content:
if block.type == "text":
assistant_text = block.text

state.last_updated = datetime.now()
return assistant_text, state

def _build_system_prompt(self, state: ConversationState) -> str:
"""Build context-aware system prompt."""
intent_context = f"Detected intents: {', '.join(state.detected_intents)}"
account_context = f"Customer tier: {state.account_info.get('tier', 'unknown')}, Subscription: {state.account_info.get('subscription_status', 'unknown')}"

return f"""You are a helpful customer support agent. Your goal is to resolve the customer's issue efficiently.

CUSTOMER CONTEXT:
- Name: {state.customer_name}
- {intent_context}
- {account_context}

CONVERSATION GUIDELINES:
1. Be empathetic and acknowledge their frustration.
2. Answer directly; avoid hedging.
3. If you need information, use available tools (lookup account, check refund policy, etc.).
4. If the issue requires human judgment, use the escalate_to_human tool.
5. Never make promises you can't keep (e.g., refund guarantees without authorization).

When tools are available, use them to gather facts before responding."""

def _execute_tools(self, tool_calls: list[dict], state: ConversationState) -> list[dict]:
"""Execute tool calls in parallel and return results."""
results = []
for call in tool_calls:
# In production, dispatch to actual tool implementations
result_content = self._dispatch_tool(call["name"], call["input"], state)
results.append({
"tool_use_id": call["id"],
"content": result_content
})

# Log execution for audit
state.tools_executed.append({
"tool": call["name"],
"input": call["input"],
"timestamp": datetime.now().isoformat()
})

return results

def _dispatch_tool(self, tool_name: str, tool_input: dict, state: ConversationState) -> str:
"""Dispatch to actual tool implementation."""
if tool_name == "lookup_account":
return self._tool_lookup_account(tool_input, state)
elif tool_name == "check_refund_policy":
return self._tool_check_refund_policy(tool_input, state)
elif tool_name == "escalate_to_human":
state.escalation_pending = True
state.escalation_reason = tool_input.get("reason", "No reason provided")
return "Escalating to human agent. A specialist will contact you shortly."
else:
return f"Tool {tool_name} not implemented."

def _tool_lookup_account(self, tool_input: dict, state: ConversationState) -> str:
"""Example: Look up customer account details."""
# In production, query your customer database
return json.dumps({
"account_id": state.customer_id,
"name": state.customer_name,
"tier": state.account_info.get("tier"),
"subscription_status": state.account_info.get("subscription_status"),
"open_tickets": 1,
"last_payment": "2026-05-15"
})

def _tool_check_refund_policy(self, tool_input: dict, state: ConversationState) -> str:
"""Example: Check refund eligibility."""
# In production, query your refund policy engine
tier = state.account_info.get("tier", "free")
if tier == "premium":
return "Premium customers: 30-day money-back guarantee. You are eligible."
else:
return "Standard customers: 14-day money-back guarantee. Check transaction date."

This architecture achieves:

  • Stateful conversations — the agent remembers context across turns.
  • Multi-step problem solving — tools are called, results used in next LLM call.
  • Auditability — every tool call is logged for compliance.
  • Graceful degradation — if a tool fails, the agent responds with the failure message.

Tool coordination and parallel execution

Real support requires calling multiple tools in sequence or parallel. A customer writes "I want a refund and to cancel my account." Your agent should:

  1. Parallel: look up order, check refund policy, check cancellation policy
  2. Synthesize: "Your refund is eligible (policy X), cancellation is effective immediately"
def execute_tools_parallel(
tool_calls: list[dict],
state: ConversationState
) -> list[dict]:
"""Execute multiple tools in parallel; return results in same order."""
from concurrent.futures import ThreadPoolExecutor, as_completed

results = {}
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(dispatch_tool, call["name"], call["input"], state): call["id"]
for call in tool_calls
}

for future in as_completed(futures):
tool_use_id = futures[future]
try:
result_content = future.result()
results[tool_use_id] = result_content
except Exception as e:
results[tool_use_id] = f"Tool error: {str(e)}"

# Return in original order
return [
{"tool_use_id": call["id"], "content": results[call["id"]]}
for call in tool_calls
]

Parallel execution cuts latency from 3 seconds (sequential) to 1 second (parallel) for three tools.

Escalation detection and human handoff

Every agent needs an escalation path. Detect when human intervention is required:

  • Explicit escalation — customer says "I want to talk to a human" or the agent detects a tool says "escalate_to_human"
  • Confidence-based — agent's response confidence is <0.6
  • Policy-based — refunds above $500, complaints mentioning legal action, etc.
  • Timeout — conversation exceeds 10 turns without resolution
def should_escalate(state: ConversationState, response_confidence: float) -> bool:
"""Determine if conversation should escalate to human."""

# High-priority intents always escalate to human review
high_priority = {"cancellation_intent", "security_report", "billing_problem"}
if any(i in high_priority for i in state.detected_intents):
return True

# Low confidence
if response_confidence < 0.6:
return True

# Too many turns without resolution
if len(state.conversation_history) > 20:
return True

# Already marked for escalation
if state.escalation_pending:
return True

return False

Key Takeaways

  • The agentic loop is the core pattern: receive, think, call tools, execute, update state, loop. This single loop handles complex multi-step problems better than any static chatbot.
  • State management is critical — store customer ID, detected intents, account info, conversation history, and tool audit trail; use this context in every system prompt.
  • Tool coordination multiplies capability — call multiple tools in parallel, use results in the next LLM call, and handle failures gracefully.
  • Escalation logic is non-negotiable — detect high-priority intents, low confidence, policy violations, and long conversations; route to human when appropriate.
  • Recursion within a turn enables tool chaining — after tools execute, call the model again to synthesize results into a final response, all within a single customer-facing turn.

Frequently Asked Questions

How many turns should I keep in the context window?

For most conversations, keep all turns (typically 5–12). If a conversation runs 20+ turns without resolution, summarize the first 10 turns into a bullet-point recap and insert it at the top of the history. This preserves key context while saving tokens. Claude's 200K context makes this rarely necessary for support conversations.

What happens if a tool execution fails?

Return the error message to the conversation history as a tool result. The model will see "Tool lookup_account failed: connection timeout" and decide whether to retry, use cached data, or tell the customer. Transparency here prevents silent failures.

How do I prevent the agent from making promises it can't keep?

Add explicit constraints in the system prompt: "Never guarantee a refund without using the check_refund_policy tool." "Never promise a feature before checking with the product team." Train the model on real refusals: "I don't have the authority to process refunds above $1000; I'll escalate to my manager."

Should I use Claude Opus, Sonnet, or Haiku for support agents?

Sonnet (3.5) is the sweet spot: 200K context, 50x cheaper than Opus, and handles 95% of support tasks. Use Haiku for simple routing/intent detection. Use Opus only if you need edge-case reasoning or 10+ parallel tool calls. In production, you'll spend 95% of inference budget on Sonnet.

How do I measure agent performance?

Track: conversation length (fewer turns = better), escalation rate (should be <15%), customer satisfaction (CSAT post-conversation), and resolution rate (did the agent solve the issue?). Measure per intent to find weak spots (e.g., refund handling might need improvement).

Further Reading