Request Timeouts and Deadlines: Prevent Hangs
A timeout is a deadline imposed on a network request: if the server does not respond within N seconds, give up and treat the request as failed. Without timeouts, a single slow or hung API call can block your entire application indefinitely, exhausting connection pools and user-facing threads. In production, hanging requests are as dangerous as crashes—they consume resources and prevent recovery. Timeouts are the circuit breaker for individual requests.
The key is that there are two distinct timeouts: connection timeout (how long to wait to establish a TCP connection) and read timeout (how long to wait for a response after connecting). Most production systems need both, with different values. A stalled connection might recover in 5 seconds, but a slow API might legitimately take 60 seconds to process a complex request.
Connection vs Read Timeouts
A network request has multiple phases:
- DNS resolution: look up the hostname. Usually fast (under 1s), but can stall.
- TCP connect: establish connection. Usually 10-100ms, but can timeout on firewall rules or network issues.
- TLS handshake (if HTTPS): negotiate encryption. Usually 50-200ms.
- Request send: upload the request body. Usually 10-100ms, except for large bodies.
- Server processing: the server runs your request. Can be 100ms to 60+ seconds.
- Response download: download the response body. Usually 10-100ms for small responses.
A connection timeout applies to phases 1-3. If the server is unreachable, the timeout fires early and saves you from waiting. A read timeout applies to phase 5: if the server is processing slowly, the timeout fires and cancels the request. Together, they ensure no single request hangs indefinitely.
Here are recommended values for LLM APIs:
| Phase | Timeout | Why |
|---|---|---|
| Connection + TLS | 5 seconds | If you cannot connect in 5 seconds, the API is likely unreachable |
| Server processing (read) | 60-120 seconds | LLM inference is I/O bound and can take time; be patient |
| Total (end-to-end) | 180 seconds | Absolute cap; fail rather than hang longer |
Implementing Timeouts in Python
The requests library accepts a timeout parameter:
import requests
# Single timeout applies to both connection and read
response = requests.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]},
timeout=60 # 60 seconds total
)
# Separate connection and read timeouts
response = requests.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]},
timeout=(5, 60) # (connection_timeout, read_timeout)
)
For streaming responses (where the server sends results over time), use a read timeout that accounts for slow streaming:
# Streaming LLM response: allow up to 2 minutes for first byte,
# then additional time for each chunk
response = requests.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4", "messages": [...]},
stream=True,
timeout=(5, 120) # 5s to connect, 120s to start receiving
)
# Read chunks with a per-chunk timeout (requests does not support this directly)
for chunk in response.iter_content(chunk_size=1024):
# If the server goes silent for 30 seconds, iter_content will timeout
if not chunk:
break
process_chunk(chunk)
The aiohttp library for async Python is more explicit:
import aiohttp
import asyncio
async def call_with_timeout():
timeout = aiohttp.ClientTimeout(
total=180, # Total request time
connect=5, # TCP connection time
sock_connect=5, # Time to first byte of TCP
sock_read=120 # Time between bytes
)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4", "messages": [...]}
) as response:
return await response.json()
Implementing Timeouts in JavaScript/TypeScript
The fetch API accepts an AbortSignal for cancellation:
async function callWithTimeout(url: string, timeoutMs: number = 60000): Promise<any> {
const controller = new AbortController();
// Abort after timeout
const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
try {
const response = await fetch(url, {
method: "POST",
headers: { "x-api-key": apiKey },
body: JSON.stringify({
model: "claude-3-sonnet-20240229",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }]
}),
signal: controller.signal
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.json();
} finally {
clearTimeout(timeoutId);
}
}
// Call with a 60-second timeout
const result = await callWithTimeout("https://api.anthropic.com/v1/messages", 60000);
For streaming responses, you can timeout individual chunks:
async function streamWithTimeout(url: string, chunkTimeoutMs: number = 30000): Promise<void> {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 180000); // 3-minute total
try {
const response = await fetch(url, {
method: "POST",
signal: controller.signal,
// ... other options
});
const reader = response.body!.getReader();
while (true) {
const chunkPromise = reader.read();
const timeoutPromise = new Promise((_, reject) =>
setTimeout(() => reject(new Error("Chunk timeout")), chunkTimeoutMs)
);
const { done, value } = await Promise.race([chunkPromise, timeoutPromise]) as any;
if (done) break;
processChunk(value);
}
} finally {
clearTimeout(timeoutId);
}
}
Propagating Deadlines Through Service Layers
A timeout on a single API call is good, but real systems have multiple layers: user request → your API → LLM API → database. Each layer should know the overall deadline so it can fail fast instead of timing out individually.
Python's contextvars makes this elegant:
import asyncio
from contextvars import ContextVar
from datetime import datetime, timedelta, timezone
deadline_var: ContextVar[float] = ContextVar("deadline")
async def with_deadline(timeout_seconds: float):
"""Context manager that sets a deadline for all nested calls."""
deadline = (datetime.now(timezone.utc) + timedelta(seconds=timeout_seconds)).timestamp()
token = deadline_var.set(deadline)
try:
yield
finally:
deadline_var.reset(token)
def remaining_time() -> float:
"""Get remaining time until deadline, or raise if past deadline."""
try:
deadline = deadline_var.get()
remaining = deadline - datetime.now(timezone.utc).timestamp()
if remaining <= 0:
raise TimeoutError("Request deadline exceeded")
return remaining
except LookupError:
return 300 # Default to 5 minutes if no deadline set
async def call_llm_with_deadline():
"""Make LLM call respecting the deadline."""
remaining = remaining_time()
response = await aiohttp_session.post(
url,
json=payload,
timeout=aiohttp.ClientTimeout(total=remaining)
)
return await response.json()
# Usage: all nested calls see the deadline
async def handle_user_request():
async with with_deadline(30.0): # 30 second deadline for entire request
result = await call_llm_with_deadline()
await save_to_database(result)
return result
Handling Timeout Errors
When a timeout fires, your code sees an exception. Treat it as a transient error and retry with backoff:
import requests
def call_with_timeout_and_retry(url: str, max_retries: int = 3) -> dict:
"""Make request with timeout and retry on timeout errors."""
for attempt in range(max_retries + 1):
try:
response = requests.post(url, timeout=(5, 60))
response.raise_for_status()
return response.json()
except requests.exceptions.Timeout as e:
if attempt < max_retries:
wait_time = 2 ** attempt # Exponential backoff
print(f"Request timed out. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise TimeoutError(f"Request failed after {max_retries} retries") from e
For languages without automatic retry libraries, implement it explicitly or use a library like tenacity (Python) or async-retry (JavaScript).
Key Takeaways
- Timeouts prevent hanging requests from consuming resources indefinitely; use both connection and read timeouts.
- Connection timeout (5 seconds) for TCP + TLS; read timeout (60-120 seconds) for API processing.
- Propagate deadlines through service layers so that all nested calls respect the user's deadline.
- Treat timeout errors as transient and retry with exponential backoff.
- Avoid extremely long timeouts (>10 minutes) unless your use case specifically requires it; prefer failing fast and retrying.
Frequently Asked Questions
What if my LLM API legitimately takes 10 minutes to process a request?
Set a read timeout longer than 10 minutes, but implement a separate user-facing timeout. For example, your API might accept requests with a 30-second response deadline, but internally spawn a background job for long-running requests and return a job ID. The user polls for completion asynchronously.
Does setting a timeout on the client side prevent the server from processing?
No. When you timeout on the client, you stop waiting, but the server continues processing. This wastes resources. To prevent the server from wasting time, use a separate cancellation mechanism (e.g., an abort endpoint or a job cancellation API) if your use case requires it.
Should I retry on every timeout?
Retry on timeouts, but use exponential backoff and a reasonable max retry count (3-5). If a request times out multiple times, it signals a real problem (network instability, server overload) and further retries are unlikely to help.
How do I know if a timeout is connection timeout or read timeout?
Different libraries report them differently. requests raises requests.exceptions.ConnectTimeout for connection phase timeouts and requests.exceptions.ReadTimeout for read phase timeouts. Check the exception type to decide whether to retry immediately or back off first.