Skip to main content

Tool Selection: Teaching Agents to Choose Wisely

With 10 tools available, how does an agent choose the right one? With 50 tools? With 100? The model must read tool descriptions, reason about the user's request, and select the most relevant tool. Poor tool selection wastes tokens, introduces errors, and slows workflows. Smart selection uses clear descriptions, hierarchical routing, and example-driven prompts to guide the model toward correct choices. Studies show that well-designed tool selection reduces wrong-tool calls by 80% and cuts token usage by 25%.

The Tool Selection Problem

When a model has access to multiple tools, it must decide which one(s) to call. A model given 5 tools makes the right choice ~95% of the time. Given 50 tools, success drops to ~70%. Given 100 tools, ~40%. Each irrelevant tool in the list adds cognitive load and increases the chance of a wrong choice. Three factors dominate:

  1. Description clarity: Vague descriptions ("process data") lead to guesses. Specific descriptions ("fetch real-time exchange rates from Forex API") guide the model.
  2. Semantic overlap: Similar tools (get_user, fetch_user, retrieve_user) confuse the model. Consolidate or differentiate clearly.
  3. Number of tools: More tools = harder selection. Above 20–30 tools, accuracy degrades linearly.

Strategy 1: Write Precise Tool Descriptions

A tool description is your first line of defense. It should be specific, example-heavy, and unambiguous.

Bad description:

{
"name": "search",
"description": "Search for information"
}

Good description:

{
"name": "web_search",
"description": "Search the public web for current news and information. Use this to find recent events, breaking news, or facts updated within the last 24 hours. Example: 'web_search(\"Tesla earnings Q2 2026\")' returns recent articles. Do NOT use for historical data or proprietary databases."
}

The good version is 3× longer but answers key questions: When do I use it? What will I get? What will I NOT get?

Effective tool descriptions include:

  • What: A one-line definition.
  • When: When to use it (e.g., "for real-time data only").
  • Input example: A concrete example with sample arguments.
  • Output type: What the result looks like.
  • Gotchas: Common misuses ("do NOT use for…").

Here is a realistic example for a multi-tool agent:

tools = [
{
"name": "search_web",
"description": "Search the public internet for current information. Returns news articles, web pages, and real-time data updated in the last 24 hours. Example: search_web('latest AI breakthroughs'). Do NOT use for proprietary data or internal documents.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (e.g., 'Bitcoin price today')"
}
},
"required": ["query"]
}
},
{
"name": "search_internal_docs",
"description": "Search company internal documentation and archives. Returns internal policies, technical specs, and proprietary knowledge. Example: search_internal_docs('authentication protocol'). For internal use only; do NOT use for public information.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for internal docs"
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Perform arithmetic and algebraic calculations. Handles +, -, *, /, **, sqrt, sin, cos, etc. Example: calculate('(100 * 1.05) / 2'). For math only; do NOT use for web data.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression (e.g., '2 + 2 * 3')"
}
},
"required": ["expression"]
}
}
]

Notice: Each description explicitly states when NOT to use it. This negative guidance is as important as positive guidance.

Strategy 2: Hierarchical Tool Organization

Instead of a flat list of 50 tools, organize them into categories. Provide the model with a category selector first, then specific tools within that category:

# Two-level hierarchy
tool_categories = {
"Data Retrieval": {
"tools": ["search_web", "search_internal_docs", "fetch_database"],
"description": "Tools for fetching information from various sources"
},
"Computation": {
"tools": ["calculate", "analyze_statistics", "run_simulation"],
"description": "Tools for mathematical and statistical processing"
},
"Business Actions": {
"tools": ["send_email", "create_ticket", "update_record"],
"description": "Tools for taking action (modifying state)"
}
}

# System prompt
system_prompt = """You have access to tools in 3 categories:
1. Data Retrieval: search_web, search_internal_docs, fetch_database
2. Computation: calculate, analyze_statistics, run_simulation
3. Business Actions: send_email, create_ticket, update_record

First, determine which category is relevant. Then, choose the specific tool within that category."""

This two-step selection ("what category?" then "what tool in that category?") dramatically improves accuracy.

Strategy 3: Example-Driven Prompts

Show the model examples of correct tool selection for similar requests:

system_prompt = """You have access to these tools:
- web_search: Search the internet for current info
- calculate: Do math
- fetch_database: Query internal systems
- send_email: Send emails

Examples:
1. User: "What is the current Bitcoin price?"
→ Use web_search (needs current data)

2. User: "What is 15% of $200?"
→ Use calculate (math, no web search needed)

3. User: "How many users signed up yesterday?"
→ Use fetch_database (internal data)

4. User: "Send a reminder email to Alice"
→ Use send_email (action, not data retrieval)

Now, given a user request, pick the most relevant tool."""

Research shows that 3–5 examples reduce tool selection errors by 40–60% compared to descriptions alone.

Strategy 4: Semantic Similarity with Embeddings

For large tool sets, use embeddings to find similar tools and filter the candidate set. This is advanced but highly effective:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Embed all tool descriptions
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

tools = [...] # Your tools list
tool_embeddings = [model.encode(tool["description"]) for tool in tools]

def select_relevant_tools(user_query, top_k=5):
"""Find the top-k most relevant tools for a query."""
query_embedding = model.encode(user_query)
similarities = cosine_similarity([query_embedding], tool_embeddings)[0]
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [tools[i] for i in top_indices]

# Usage: filter tool list before sending to model
user_query = "What is the current weather?"
relevant_tools = select_relevant_tools(user_query, top_k=3)
# Only send top 3 tools to the model, not all 50
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
tools=relevant_tools, # Reduced set
messages=messages
)

By filtering the tool list to only the top K most relevant tools, you reduce cognitive load and improve accuracy to near the 5-tool baseline.

Strategy 5: Tool Router (Dedicated Model Call)

For critical applications, use a separate model call to choose the tool before calling the main agent:

def select_tool_with_router(user_query, tools):
"""Use a dedicated router model call to select the right tool."""

tool_list = "\n".join([f"- {t['name']}: {t['description']}" for t in tools])

router_response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
messages=[{
"role": "user",
"content": f"""Given this user query, which tool should I use?

Tools:
{tool_list}

Query: {user_query}

Respond with ONLY the tool name (e.g., 'web_search')."""
}]
)

tool_name = router_response.content[0].text.strip()
selected_tool = next((t for t in tools if t["name"] == tool_name), None)

if not selected_tool:
# Router made a mistake; fall back to full agent
return None

return selected_tool

A dedicated router adds one API call but guarantees the right tool is chosen. For workflows where tool correctness is critical (financial, legal), this investment pays off.

Comparison: Tool Selection Strategies

StrategyOverheadAccuracyScalabilityComplexity
Descriptions onlyNone70–80%Up to 20 toolsLow
Descriptions + ExamplesPrompt tokens80–90%Up to 30 toolsLow
Hierarchical categoriesPrompt tokens85–95%Up to 100 toolsMedium
Embedding filteringCompute90–95%UnlimitedHigh
Dedicated router1 API call95%+UnlimitedHigh

Key Takeaways

  • Clear, specific tool descriptions reduce wrong-tool calls by 40–60%.
  • Organize tools hierarchically to guide selection and reduce cognitive load.
  • Include examples of correct tool selection in system prompts.
  • For large tool sets, use embedding-based filtering to reduce candidates before sending to the model.
  • For critical applications, use a dedicated tool router model call.

Frequently Asked Questions

What if no tool is relevant?

The model should say so. Include an "ask the user for clarification" option in your system prompt. If the model calls a wrong tool, it is better to fail gracefully than to hallucinate.

How do I know if tool selection is the bottleneck?

Log every tool call and compare the model's choice to the "ideal" choice based on your own judgment. If the model picks a wrong tool >10% of the time, improve tool descriptions or use hierarchical categories.

Can I merge similar tools?

Yes. If you have get_user, fetch_user, and retrieve_user, consolidate to one get_user tool. Similarly named tools create confusion. One clear tool is better than three ambiguous ones.

What if tools have overlapping domains?

Differentiate them in descriptions. Example: "search_web (for public information, up to 24 hours old)" vs. "search_internal_docs (for company knowledge, any age)". Make the distinction obvious.

Should I use a model with longer context?

Tool selection improves slightly with larger context windows, but descriptions and examples matter more. A small model with excellent descriptions outperforms a large model with vague descriptions.

Further Reading