Agent Tool Calling Basics: Complete Guide
Agent tool calling is the mechanism by which large language models invoke external functions instead of just generating text. When a model recognizes that a user's request requires real-world action—looking up information, performing a calculation, or updating a database—it outputs a structured request to call a specific tool with specific arguments. Your agent runtime then executes that function and returns the result back to the model, which reasons over the outcome and either completes the task, calls another tool, or reports an error. This cycle is the core of every working agent system.
What Is Agent Tool Calling?
Agent tool calling is a structured protocol where a language model stops generating natural language and instead produces a formatted tool invocation containing a function name and typed arguments. The agent framework intercepts this invocation, executes the underlying function in a sandboxed environment, and feeds the result back into the model's context so it can continue reasoning.
A tool call has three parts: (1) the tool name (e.g., search_database), (2) the arguments as a JSON object (e.g., { "query": "Python async", "limit": 5 }), and (3) the result, which the agent framework captures and appends to the conversation history. The model then reads the result and decides what to do next—call another tool, summarize findings, or respond to the user. This loop repeats until the model signals that it is done.
Unlike traditional chatbots, which are pure text-generation engines, agents become truly interactive systems. They can fetch real data, perform computations, trigger business processes, and report back on outcomes. According to research by Anthropic (2025), agents with access to correct tool definitions reduce factual errors by 87% compared to models relying on their training data alone.
How the Tool Call Loop Works
The tool calling loop has five stages:
- User request: The agent receives a user message (e.g.,
Find the top 3 Python libraries for async programming). - Model reasoning: The model reads the request and a list of available tools. It decides whether to call a tool and which one.
- Tool invocation: If the model needs external information, it outputs a structured tool call with the function name and arguments.
- Execution & result: Your runtime executes the function, captures the output, and appends it to the conversation.
- Continuation: The model reads the result, decides whether to call another tool or respond to the user, and repeats.
Here is a simplified code example showing the loop:
import anthropic
import json
client = anthropic.Anthropic()
# Define a tool
tools = [
{
"name": "search_docs",
"description": "Search technical documentation for a keyword",
"input_schema": {
"type": "object",
"properties": {
"keyword": {"type": "string"},
"max_results": {"type": "integer"}
},
"required": ["keyword"]
}
}
]
# Simulate a tool implementation
def search_docs(keyword, max_results=5):
results = {
"asyncio": "Built-in async I/O library (CPython stdlib)",
"trio": "Third-party async framework with advanced features",
"anyio": "Compatibility layer supporting asyncio and trio"
}
return [{"name": k, "description": v} for k, v in list(results.items())[:max_results]]
# Main agent loop
messages = [
{"role": "user", "content": "What are the top Python async libraries?"}
]
while True:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# Check if the model called a tool
if response.stop_reason == "tool_use":
# Find the tool call in the response
tool_call = None
for block in response.content:
if block.type == "tool_use":
tool_call = block
break
if tool_call:
# Execute the tool
if tool_call.name == "search_docs":
result = search_docs(**tool_call.input)
# Append model response and tool result to history
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": json.dumps(result)
}
]
})
else:
# Model finished (stop_reason == "end_turn")
break
# Extract final response
for block in response.content:
if hasattr(block, "text"):
print(block.text)
This loop continues until the model's stop_reason is "end_turn" (not "tool_use"), meaning it has finished responding to the user. The key insight: every tool result becomes part of the conversation history, so the model can reason over what it learned and make informed decisions about what to do next.
Tool Definition Structure
Each tool in your agent needs four pieces of information:
| Element | Purpose | Example |
|---|---|---|
| Name | Identifier for the model to invoke | fetch_weather |
| Description | Natural language explanation of what it does | Fetch current weather for a city and country |
| Input Schema | JSON Schema defining required/optional arguments | {"type": "object", "properties": {...}} |
| Implementation | The actual function or API that runs | def fetch_weather(city, country): ... |
The input schema is critical. It tells the model what arguments are required, their types, valid ranges, and examples. A well-written schema dramatically reduces hallucinated arguments and runtime errors. The Anthropic API enforces JSON Schema Draft 2020-12, so your tool definitions must be valid.
Here is a realistic example:
tools = [
{
"name": "calculate_roi",
"description": "Calculate return on investment for a given initial amount and annual yield",
"input_schema": {
"type": "object",
"properties": {
"principal": {
"type": "number",
"description": "Initial investment amount in USD"
},
"annual_yield": {
"type": "number",
"description": "Annual percentage return (e.g., 5.2 for 5.2%)"
},
"years": {
"type": "integer",
"description": "Investment period in years",
"minimum": 1,
"maximum": 50
}
},
"required": ["principal", "annual_yield", "years"]
}
}
]
def calculate_roi(principal, annual_yield, years):
# Compound interest formula
amount = principal * ((1 + annual_yield / 100) ** years)
roi = amount - principal
return {
"principal": principal,
"final_amount": round(amount, 2),
"total_return": round(roi, 2),
"percentage_gain": round((roi / principal) * 100, 2)
}
Common Tool Calling Patterns
Pattern 1: Single lookup. User asks a factual question; agent calls one tool to fetch data, returns the answer.
Pattern 2: Multi-step verification. Agent calls one tool to get an answer, then calls a second tool to verify or enrich the result.
Pattern 3: Aggregation. Agent calls multiple tools in parallel, combines results, and summarizes.
Pattern 4: Fallback. Primary tool fails; agent catches the error and calls an alternative tool.
These patterns are covered in detail in later articles, but the core mechanic is always the same: model outputs tool name + arguments, runtime executes, result is appended to history.
Key Takeaways
- Agent tool calling is a structured loop: model reasons → calls tool → reads result → decides next step.
- Each tool needs a name, description, JSON Schema input definition, and a runtime implementation.
- The agent framework intercepts tool calls, executes them safely, and returns results to the model's context.
- Well-designed tool schemas prevent invalid arguments and reduce hallucination.
- The conversation history accumulates tool results, enabling multi-step reasoning over real data.
Frequently Asked Questions
Does the model always call a tool?
No. If the user's request can be answered from the model's training knowledge, or if all tools are irrelevant, the model will simply generate a text response with stop_reason == "end_turn". Tool calling is optional; agents only invoke tools when they determine it is necessary.
What happens if a tool call fails?
If execution raises an exception, you catch it, format an error message (e.g., { "error": "Database unreachable" }), and append it to the conversation with tool_use_id and type "tool_result". The model reads the error and may retry, call a different tool, or explain the limitation to the user.
Can a tool call another tool?
No. The agent framework invokes tools; tools themselves run synchronously and return data. Tools do not have access to the model or to the list of available tools. If a workflow requires chaining—call A then B then C—the model orchestrates the chain by calling A, reading the result, then calling B, and so on.
How many tools can I provide?
There is no hard limit, but increasing the count increases the model's decision latency. Context length and the model's ability to reason over many options degrades after 20–30 tools. For larger tool sets, use hierarchical selection: provide categories or tags that help the model narrow down before choosing a specific tool.
What is the cost of tool calling?
Tool calls are processed tokens. If your tool schema is large or you have many tools, prompt tokens increase. Tool results are appended as user tokens. Long tool results or many sequential calls can inflate context usage. Monitor token usage in production and consider tool result compression or sampling for cost control.