Idempotency and Replay: Fault-Tolerant Requests
Idempotency is a property of operations that produce the same result no matter how many times they are executed. In API terms, an idempotent request produces the same effect whether called once or 100 times. This property is essential for fault-tolerant systems because it allows you to retry requests without fear of creating duplicate side effects.
Without idempotency, a retry after a network timeout or 500 error might create two identical API calls, resulting in duplicate charges, double-counted tokens, or duplicate log entries. With idempotency, a retry is safe: the server deduplicates the request and returns the same result as the first call. Most modern APIs (Stripe, AWS, Anthropic) support idempotency through request IDs.
How Idempotency Works
The mechanism is simple: you include a unique identifier (idempotency key) with each request. If the same request (same idempotency key) arrives multiple times, the server returns the cached result from the first request instead of processing it again.
Here is the flow:
First request: POST /api/messages
Idempotency-Key: request-abc-123
↓
Server stores key and result
↓
Returns: {"id": "msg-999", "status": "created"}
Retry request: POST /api/messages
Idempotency-Key: request-abc-123 (same key)
↓
Server checks key in cache
↓
Returns: {"id": "msg-999", "status": "created"} (cached)
This is called exactly-once semantics: the API guarantees that despite multiple requests, the operation happens exactly once.
Generating Idempotency Keys
An idempotency key must be unique per logical request. In practice, you generate a UUID or use a deterministic hash of the request content:
import uuid
import hashlib
import json
def generate_idempotency_key(method: str = "uuid") -> str:
"""Generate an idempotency key."""
if method == "uuid":
return str(uuid.uuid4())
else:
raise ValueError(f"Unknown method: {method}")
def generate_deterministic_key(user_id: str, prompt: str, model: str) -> str:
"""
Generate deterministic idempotency key from request content.
Useful if you want retries of the same logical request to use the same key.
"""
content = f"{user_id}:{prompt}:{model}".encode()
return hashlib.sha256(content).hexdigest()
# Usage
key = generate_idempotency_key()
print(f"Idempotency key: {key}")
# Deterministic (useful for form resubmissions)
key = generate_deterministic_key("user-42", "What is AI?", "gpt-4")
print(f"Deterministic key: {key}")
The UUID approach is more common for stateless API clients: each request gets a fresh UUID, and the server deduplicates by key. The deterministic approach is useful for form resubmissions: if a user resubmits a form, you want the same idempotency key so the server returns the cached result.
Implementing Idempotent Requests in Python
Here is a resilient client that includes idempotency:
import requests
import uuid
from typing import Dict, Any, Optional
class IdempotentLLMClient:
"""LLM client with built-in idempotency support."""
def __init__(self, api_key: str, api_url: str):
self.api_key = api_key
self.api_url = api_url
def call_with_idempotency(
self,
prompt: str,
model: str = "gpt-4",
idempotency_key: Optional[str] = None,
max_retries: int = 3
) -> Dict[str, Any]:
"""
Call LLM API with idempotency key for fault tolerance.
Args:
prompt: The user prompt
model: Model name (e.g., "gpt-4")
idempotency_key: Unique request ID; generated if None
max_retries: Number of retries on transient failures
Returns:
API response
"""
if idempotency_key is None:
idempotency_key = str(uuid.uuid4())
headers = {
"Authorization": f"Bearer {self.api_key}",
"Idempotency-Key": idempotency_key # Include key in headers
}
payload = {
"model": model,
"messages": [{"role": "user", "content": prompt}]
}
for attempt in range(max_retries + 1):
try:
response = requests.post(
f"{self.api_url}/chat/completions",
headers=headers,
json=payload,
timeout=60
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
if attempt < max_retries:
# Retry with same idempotency key
# Server will return cached result if available
wait_time = 2 ** attempt
print(f"Attempt {attempt + 1} failed. Retrying in {wait_time}s...")
requests.sleep(wait_time)
else:
raise
# Usage
client = IdempotentLLMClient(
api_key="sk-...",
api_url="https://api.openai.com/v1"
)
# First call
result1 = client.call_with_idempotency("What is AI?", idempotency_key="request-1")
# Retry with same key - server returns cached result
result2 = client.call_with_idempotency("What is AI?", idempotency_key="request-1")
# Both have the same ID and content - idempotency working
assert result1["id"] == result2["id"]
Server-Side Idempotency Implementation
If you are building your own LLM wrapper API, you must implement idempotency on the server. Here is a minimal example:
from fastapi import FastAPI, Header
from typing import Optional
import json
app = FastAPI()
# In-memory cache of idempotency keys and their results
# In production, use Redis or a database
idempotency_cache = {}
@app.post("/api/messages")
async def create_message(
request: dict,
idempotency_key: Optional[str] = Header(None)
) -> dict:
"""Create a message with idempotency support."""
if idempotency_key is None:
raise ValueError("Idempotency-Key header is required")
# Check if this key was already processed
if idempotency_key in idempotency_cache:
print(f"Returning cached result for key {idempotency_key}")
return idempotency_cache[idempotency_key]
# Process the request (call LLM, etc.)
try:
# Simulate LLM call
result = {
"id": f"msg-{uuid.uuid4()}",
"status": "created",
"content": f"Response to: {request['prompt']}"
}
# Cache the result before returning
idempotency_cache[idempotency_key] = result
return result
except Exception as e:
# Do NOT cache errors - allow retry on transient failure
print(f"Request failed: {e}")
raise
Important: cache successful results only, not errors. If a request fails transiently (timeout, 500), the retry should attempt the operation again, not return the cached error.
Idempotency in Batch Operations
For batch requests (processing many prompts), include an idempotency key per item:
def batch_call_with_idempotency(prompts: list[str], api_key: str) -> list[dict]:
"""Process multiple prompts with per-item idempotency."""
client = IdempotentLLMClient(api_key, "https://api.openai.com/v1")
results = []
for i, prompt in enumerate(prompts):
# Generate a key unique to this item in the batch
idempotency_key = f"batch-{batch_id}-item-{i}"
try:
result = client.call_with_idempotency(
prompt,
idempotency_key=idempotency_key
)
results.append(result)
except Exception as e:
print(f"Item {i} failed: {e}")
results.append({"error": str(e)})
return results
Cleanup and Expiration
Idempotency caches can grow unbounded if not managed. In production, implement expiration:
from datetime import datetime, timedelta
class ExpiringIdempotencyCache:
"""Idempotency cache with automatic expiration."""
def __init__(self, ttl_seconds: int = 3600):
self.ttl_seconds = ttl_seconds
self.cache = {} # {key: (result, timestamp)}
def get(self, key: str) -> Optional[dict]:
"""Get cached result if it exists and has not expired."""
if key not in self.cache:
return None
result, timestamp = self.cache[key]
elapsed = (datetime.now() - timestamp).total_seconds()
if elapsed > self.ttl_seconds:
del self.cache[key] # Expired
return None
return result
def set(self, key: str, result: dict):
"""Cache a result."""
self.cache[key] = (result, datetime.now())
Most API providers expire idempotency keys after 24-48 hours. Choose a similar window for your cache.
Key Takeaways
- Idempotency allows safe retries: the same request can be called multiple times without duplicate side effects.
- Every request should include an idempotency key (UUID or deterministic hash).
- Servers must cache results by idempotency key and return cached results on duplicate requests.
- Cache successful results only; allow retries to re-attempt on transient errors.
- Implement cache expiration (24-48 hours) to limit memory usage.
Frequently Asked Questions
What is the difference between idempotency and deduplication?
Idempotency is a property of the operation itself (calling it multiple times has the same effect). Deduplication is the mechanism (the server detects and rejects duplicates). Idempotency is the property; deduplication is the implementation.
Should I use UUID or deterministic keys?
Use UUID for most cases. It is simpler and avoids key collisions. Use deterministic keys only if you need retries of the same logical request to reuse the cache (e.g., form resubmissions).
What if my request fails with a 400 Bad Request?
Do not retry 400 errors. The request is malformed and retries will fail the same way. Do not cache the error. Return the error immediately to the client.
How long should I keep idempotency keys?
API providers typically keep keys for 24-48 hours. If you are implementing your own, use a similar window. Longer windows consume more memory; shorter windows risk duplicate operations if clients retry after expiration.
Does every API support idempotency?
No. Stripe, AWS, and Anthropic support it. Some APIs do not. Check your API documentation. If it does not support idempotency natively, you can implement it on the client by deduplicating in your own cache.