Skip to main content

Tool Use During Realtime Conversations: Async Actions

A voice agent that only converses is useful but limited. A voice agent that can act—looking up information, placing orders, scheduling appointments—becomes genuinely powerful. Tool use in realtime voice agents means the LLM can decide to call a function (like a database query or API call) mid-conversation, wait for the result, and incorporate it into the response.

However, tools introduce latency. A 500 ms database query can exceed your latency budget. This article covers tool use patterns, how to keep responses streaming despite tool latencies, and how to handle tool failures gracefully.

Understanding Tool Use in Voice Agents

Tool use (also called function calling) is when an LLM decides to invoke a function and observe the result. For example:

User: "What's the status of my order?"
LLM: "I'll check that for you." [decides to call: get_order_status(order_id=12345)]
Tool: Returns {"status": "shipped", "tracking": "1Z999AA"}
LLM: "Your order is shipped. Tracking number: 1-Z-999-AA."

The key challenge is latency: the LLM pauses, waits for the tool, then generates a response. This can easily push your turn latency beyond 2 seconds.

Patterns for Tool Use in Realtime

Pattern 1: Prefix Response + Parallel Tool

Start speaking a filler response while the tool runs in parallel:

class RealtimeToolUsingAgent:
"""
Invokes tools asynchronously while continuing TTS streaming.
"""

def __init__(self, llm_client, tools):
self.llm = llm_client
self.tools = {tool["name"]: tool["fn"] for tool in tools}

async def process_with_tool_use(self, transcript):
"""
Stream LLM response, detect tool calls, and run tools asynchronously.
"""
pending_tools = []
response_text = ""

# Start streaming LLM response
async for chunk in self.llm.stream_completion_with_tools(
prompt=transcript,
tools=self.tools
):
if chunk.type == "text":
response_text += chunk.text
# Stream to TTS immediately
await self.tts_queue.put(chunk.text)

elif chunk.type == "tool_call":
# LLM wants to call a tool
tool_name = chunk.tool_name
tool_args = chunk.tool_arguments

print(f"[Tool] Invoking {tool_name}({tool_args})")

# Launch tool in background (don't wait)
task = asyncio.create_task(
self._invoke_tool_async(tool_name, tool_args)
)
pending_tools.append((tool_name, task))

elif chunk.type == "tool_result_reference":
# LLM wants to reference a tool result
# Wait for the corresponding tool to finish
tool_name = chunk.tool_name
tool_task = next((t for n, t in pending_tools if n == tool_name), None)

if tool_task:
result = await tool_task
print(f"[Tool Result] {tool_name}: {result}")

# Continue LLM generation with tool result
# (This is handled by the streaming LLM automatically)

# Wait for any remaining tools
for tool_name, task in pending_tools:
try:
await asyncio.wait_for(task, timeout=5.0)
except asyncio.TimeoutError:
print(f"[Tool] Timeout for {tool_name}")

async def _invoke_tool_async(self, tool_name, tool_args):
"""Execute a tool asynchronously."""
try:
tool_fn = self.tools[tool_name]
result = await tool_fn(**tool_args)
return result
except Exception as e:
return {"error": str(e)}

Pattern 2: Streaming Filler While Tool Runs

If tool latency is significant, speak a filler phrase while waiting:

async def handle_tool_with_filler(self, tool_name, tool_args):
"""
Say a filler phrase while a tool runs in the background.
Reduces perceived latency.
"""
# Filler phrases to buy time
fillers = {
"get_order_status": "Let me check your order details.",
"search_database": "Searching the database for that information.",
"check_calendar": "Looking at the calendar for available slots.",
}

# Start TTS for filler immediately
filler = fillers.get(tool_name, "Looking that up for you.")
tts_task = asyncio.create_task(self.tts_client.stream_audio(filler))

# Run tool in parallel
tool_task = asyncio.create_task(
self._invoke_tool_async(tool_name, tool_args)
)

# Wait for both to complete
tts_result = await tts_task
tool_result = await tool_task

return tool_result

Handling Tool Errors Gracefully

Tools fail: databases go offline, APIs return errors, network timeouts. Your agent should recover:

async def _invoke_tool_with_fallback(self, tool_name, tool_args, timeout_sec=3):
"""
Invoke tool with timeout and fallback if it fails.
"""
try:
tool_fn = self.tools[tool_name]
result = await asyncio.wait_for(
tool_fn(**tool_args),
timeout=timeout_sec
)
return {"success": True, "data": result}

except asyncio.TimeoutError:
print(f"[Tool] Timeout for {tool_name}")
return {
"success": False,
"error": f"I couldn't retrieve that information in time. Please try again."
}

except KeyError:
return {
"success": False,
"error": f"Tool '{tool_name}' not found."
}

except Exception as e:
return {
"success": False,
"error": f"Error calling {tool_name}: {str(e)}"
}

async def process_with_error_recovery(self, transcript):
"""
Stream LLM response, handle tool errors gracefully.
"""
response_text = ""

async for chunk in self.llm.stream_completion_with_tools(
prompt=transcript,
tools=self.tools
):
if chunk.type == "text":
response_text += chunk.text
await self.tts_queue.put(chunk.text)

elif chunk.type == "tool_call":
tool_result = await self._invoke_tool_with_fallback(
chunk.tool_name,
chunk.tool_arguments,
timeout_sec=2 # Aggressive timeout for voice
)

if not tool_result["success"]:
# Tool failed; tell user
error_msg = tool_result["error"]
print(f"[Error] {error_msg}")

# Append error to TTS
await self.tts_queue.put(f" {error_msg}")

Tool Definition and OpenAI Integration

Here's how to define tools and integrate with OpenAI's function calling:

# Define your tools as JSON schemas
TOOLS = [
{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Retrieves the status of a customer order",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The unique order identifier"
}
},
"required": ["order_id"]
}
}
},
{
"type": "function",
"function": {
"name": "search_database",
"description": "Searches a database for information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"table": {
"type": "string",
"description": "Database table to search"
}
},
"required": ["query", "table"]
}
}
}
]

# Implement the actual tool functions
async def get_order_status(order_id: str) -> dict:
"""Query order database (simulated)."""
await asyncio.sleep(0.5) # Simulate database latency
return {"status": "shipped", "tracking": "1Z999AA"}

async def search_database(query: str, table: str) -> list:
"""Search database (simulated)."""
await asyncio.sleep(0.3)
return [{"match": "result1"}, {"match": "result2"}]

# OpenAI Realtime API integration
async def voice_agent_with_tools():
"""Voice agent using tools via OpenAI Realtime API."""
from openai import AsyncOpenAI

client = AsyncOpenAI()

# Realtime session with tools
async with client.beta.realtime.connect(
model="gpt-4-realtime-preview",
modalities=["text", "audio"],
instructions="You are a helpful assistant. Use tools when needed to answer questions.",
tools=TOOLS, # Pass tools to the realtime API
voice="alloy"
) as session:
print("Voice agent with tools started")

# Handle tool calls from LLM
async for response in session:
if response.type == "response.function_call_arguments.delta":
# LLM is calling a function
print(f"[Function Call] {response.name}: {response.arguments}")

elif response.type == "response.audio.delta":
# Agent audio response
await play_audio(response.delta)

Latency-Conscious Tool Integration

Design your tools to fit your latency budget:

class LatencyAwareTool:
"""
Tools that are aware of time constraints.
"""

def __init__(self, max_latency_ms=800):
self.max_latency_ms = max_latency_ms

async def call_with_budget(self, func, args, filler_text=None):
"""
Call a tool with a strict latency budget.
If it times out, return a quick fallback.
"""
try:
start = time.time()
result = await asyncio.wait_for(
func(**args),
timeout=self.max_latency_ms / 1000
)
elapsed = (time.time() - start) * 1000
print(f"[Tool] Completed in {elapsed:.0f} ms")
return result

except asyncio.TimeoutError:
print(f"[Tool] Timeout (budget: {self.max_latency_ms} ms)")
if filler_text:
await self.tts_queue.put(filler_text)
return {"status": "timeout", "fallback": True}

Key Takeaways

  • Tool use enables voice agents to invoke functions (APIs, databases) during conversation, but adds latency that must be managed.
  • Use asynchronous tool invocation: start the tool call in the background while continuing to stream TTS, rather than blocking.
  • Implement filler responses (e.g., "Let me check that for you") while tools run, reducing perceived latency by 300–500 ms.
  • Set strict tool timeouts (2–3 seconds) and provide graceful fallbacks if tools fail or time out.
  • Define tools as JSON schemas compatible with your LLM (OpenAI function calling, Claude tool use, etc.) to enable LLM-driven tool selection.

Frequently Asked Questions

Can I use multiple tools simultaneously?

Yes. Launch them all as concurrent asyncio tasks and wait for all to complete. However, be cautious of cumulative latency: if three tools each take 500 ms sequentially, that's 1.5 seconds. Run them in parallel: all start at once, they finish together (or nearly so).

What if a tool needs data from another tool (dependency)?

Use task chains: tool A completes, its output feeds to tool B, then tool B runs. For example, get_order_id (from user), then get_order_status (using that ID). Implement this as a task graph (DAG) to optimize dependencies.

How do I avoid tool hallucination (LLM inventing tools)?

Validate tool calls before invoking: check that the tool name exists and arguments match the schema. If invalid, return an error to the LLM and let it retry or apologize. This is usually handled by the LLM provider (OpenAI validates tool names before sending to you).

Can I use synchronous (non-async) tools?

Yes, but wrap them with asyncio.to_thread() to avoid blocking the event loop:

result = await asyncio.to_thread(sync_tool_function, arg1, arg2)

However, prefer async tools for voice applications where latency is critical.

What's the maximum tool latency I can tolerate?

2–3 seconds in absolute worst-case. But aim for 500–1000 ms. If your tool is slower, cache results, optimize the database query, or use a faster fallback (approximate vs. exact results).

Further Reading