Skip to main content

Structured Logging for LLM Apps: JSON Logs

Structured logging is the practice of emitting application events as machine-readable JSON objects instead of unstructured text strings. For LLM applications, structured logs create queryable records of every API call, token count, cost, latency, and error. Unlike logs that read as "LLM call took 1.2 seconds," a structured log is JSON: {"timestamp": "2026-06-02T14:30:00Z", "trace_id": "abc123", "event": "llm_call", "model": "gpt-4", "input_tokens": 280, "output_tokens": 150, "latency_ms": 1200, "cost_usd": 0.0126}. This structure enables log aggregation platforms (ELK, Datadog, Splunk) to index, query, and alert on any field in milliseconds.

Why Structured Logs Over Plain Text

Unstructured logs (plain text strings) are human-readable but machine-hostile. If you log "Processing user query for customer 42, embedding took 145ms, vector search found 8 results", a log aggregation tool must parse your sentence grammar to extract the latency (145ms) or result count (8). Structured logs flip this: you emit a JSON object with explicit fields: {"user_id": 42, "embedding_latency_ms": 145, "vector_results": 8}. Aggregation tools parse JSON natively, so queries like "show me all requests where embedding latency exceeded 200ms" execute in milliseconds across terabytes of logs.

LLM apps amplify this need. Each inference call generates a dozen related events: prompt preparation, API call, token counting, embedding lookup, cache hit/miss, post-processing, user response formatting. Without structure, correlating these events across logs requires manual grep-and-eyeball work. With structured logs and a common trace_id, you can query: "Show me all events for trace_id=xyz789 in order," and a dashboard reconstructs the entire request timeline in seconds.

Core Fields for LLM Structured Logs

A production LLM log entry should include:

FieldTypePurposeExample
timestampISO 8601Event time; critical for correlating across systems2026-06-02T14:30:00Z
trace_idUUIDRequest identifier; links all events in a single request550e8400-e29b-41d4-a716-446655440000
span_idUUIDParent–child relationship within a tracef0ca7b1a-51e2-4d37-9476-b5e9a3d5c9f2
parent_span_idUUID (optional)Links spans in a directed acyclic graph (DAG)a1b2c3d4-e5f6-4g7h-8i9j-k0l1m2n3o4p5
levelStringSeverity: debug, info, warn, errorinfo
messageStringHuman-readable event summaryLLM API call completed successfully
eventStringStructured event type for filteringllm_call, embedding_api, vector_search
modelStringModel identifiergpt-4-turbo, claude-3-opus, llama-2-70b
input_tokensIntegerTokens in the prompt280
output_tokensIntegerTokens in the completion150
total_tokensIntegerSum of input + output430
cost_usdFloatCalculated cost of this call0.0126
latency_msIntegerWall-clock duration of the operation1200
user_idStringEnd user identifier (for multi-tenant apps)user_42
user_tierStringUser's subscription level (for cost attribution)premium
session_idStringUser session identifiersess_abcdef123
error_codeStringError category if applicablerate_limit_exceeded, invalid_api_key
error_messageStringError detailsAPI returned 429 Too Many Requests
metadataObjectCustom app-specific fields{"feature": "customer_support", "version": "2.1"}

Not every log must include all fields. A database lookup log might omit model and cost_usd. An LLM call log should always include trace_id, model, input_tokens, output_tokens, and cost_usd.

Example: Instrumented LLM Call in Python

Here is a Python example emitting structured logs for an LLM-based chat endpoint:

import json
import logging
import uuid
from datetime import datetime
from anthropic import Anthropic

# Configure JSON logging
logging.basicConfig(
level=logging.INFO,
format='%(message)s'
)
logger = logging.getLogger(__name__)

def log_structured(event: str, **kwargs):
"""Emit a structured JSON log entry."""
log_entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"event": event,
**kwargs
}
logger.info(json.dumps(log_entry))

def chat_with_observability(user_message: str, trace_id: str = None):
"""LLM chat function with structured logging."""
trace_id = trace_id or str(uuid.uuid4())

log_structured(
"chat_request_start",
trace_id=trace_id,
user_message_length=len(user_message)
)

client = Anthropic()

# Pre-compute input tokens (optional cost forecast)
# This is pseudocode; see Anthropic docs for actual token counting
input_tokens = len(user_message) // 4 # rough estimate

start_time = datetime.utcnow()
try:
response = client.messages.create(
model="claude-3-opus-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}]
)

end_time = datetime.utcnow()
latency_ms = int((end_time - start_time).total_seconds() * 1000)

output_tokens = response.usage.output_tokens
input_tokens_actual = response.usage.input_tokens
total_tokens = input_tokens_actual + output_tokens

# Pricing for Claude 3 Opus (2026 rates; adjust per model)
input_cost = input_tokens_actual * 0.000015
output_cost = output_tokens * 0.000075
total_cost = input_cost + output_cost

log_structured(
"llm_call",
trace_id=trace_id,
model="claude-3-opus-20250219",
input_tokens=input_tokens_actual,
output_tokens=output_tokens,
total_tokens=total_tokens,
latency_ms=latency_ms,
cost_usd=round(total_cost, 6),
stop_reason=response.stop_reason
)

return {
"response": response.content[0].text,
"trace_id": trace_id,
"tokens_used": total_tokens,
"cost_usd": total_cost
}

except Exception as e:
end_time = datetime.utcnow()
latency_ms = int((end_time - start_time).total_seconds() * 1000)

log_structured(
"llm_call_error",
trace_id=trace_id,
model="claude-3-opus-20250219",
latency_ms=latency_ms,
error_type=type(e).__name__,
error_message=str(e)
)
raise

# Example usage
result = chat_with_observability("What is the capital of France?")
print(f"Response: {result['response']}")
print(f"Cost: ${result['cost_usd']:.6f}")

This code emits structured logs at the start and end of the chat request. Each log entry carries a trace_id that ties together the request and response. The llm_call log includes token counts and cost, enabling cost tracking and quota enforcement.

Log Aggregation and Querying

Once logs are structured and centralized in a platform like Datadog or Splunk, you can run sophisticated queries:

# Find all LLM calls that took over 2 seconds and cost more than $0.01
event:llm_call AND latency_ms:>2000 AND cost_usd:>0.01

# Calculate average tokens per user per day
stats avg(total_tokens) by user_id, _time

# Alert if error rate exceeds 1% in the last 5 minutes
stats count(*) as total, count(error_code) as errors by _time
| eval error_rate = (errors / total) * 100
| where error_rate > 1

Structured logging enables these queries to execute in milliseconds against petabytes of logs.

Context Propagation: Passing Trace IDs

In a distributed system (microservices, async workers), you must propagate the trace_id through:

  1. HTTP headers — Include X-Trace-ID or Traceparent in every HTTP request.
  2. Message queues — Embed trace ID in message metadata (RabbitMQ headers, SQS attributes).
  3. Function arguments — Pass trace ID to functions that spawn background jobs.
  4. Thread-local storage — Use context variables (Python's contextvars) to propagate trace IDs across async tasks without manually threading them.

Here is a Python example using context variables:

from contextvars import ContextVar
import asyncio

trace_id_var: ContextVar[str] = ContextVar('trace_id', default='unknown')

async def background_task(data):
"""Background task automatically inherits trace_id from context."""
trace_id = trace_id_var.get()
log_structured("background_task_start", trace_id=trace_id, data=data)
await asyncio.sleep(1)
log_structured("background_task_end", trace_id=trace_id)

async def main(user_request_trace_id):
"""Set trace_id in context for this request."""
token = trace_id_var.set(user_request_trace_id)
try:
await background_task({"user_id": 42})
finally:
trace_id_var.reset(token)

# Simulate request
asyncio.run(main("req-12345"))

Both the background_task_start and background_task_end logs now carry the same trace_id, correlating them to the original request.

Key Takeaways

  • Structured JSON logs are queryable; unstructured text logs require manual parsing and are slow to aggregate.
  • Every log entry must carry a trace_id to correlate events across a single request lifecycle.
  • Core LLM fields include timestamp, model, input_tokens, output_tokens, cost_usd, and latency_ms.
  • Use context variables or thread-local storage to propagate trace IDs through async tasks and distributed systems.
  • Structured logs enable high-speed queries for debugging, alerting, and cost attribution.

Frequently Asked Questions

Should I log every LLM call or sample them?

Log 100% of LLM calls in production. The incremental cost of logging is negligible compared to the LLM API cost. Sampling might miss errors or cost anomalies. If storage is a concern, use log retention policies (e.g., keep all logs for 7 days, then archive) rather than sampling.

What is the performance overhead of structured logging?

Negligible if you use async logging (log to a buffer; flush periodically). Synchronous logging (wait for disk write before returning from the function) can add 1–5 ms per call. Use async JSON logging libraries (e.g., Python's python-json-logger) to keep overhead under 0.5 ms per call.

How do I prevent trace IDs from leaking into user-facing responses?

Keep trace IDs in logs and HTTP response headers (X-Trace-ID), not in JSON response bodies. If you want to expose a trace ID to the user for support inquiries, use a shorter, separate field (e.g., public_error_id) unrelated to the internal trace UUID.

Can I use structured logging with async/concurrent code?

Yes. Use context variables (contextvars in Python, AsyncContext in Rust, cls in Go) to bind the trace ID to each async task. The context variable is automatically inherited by child tasks and restored when tasks complete.

Further Reading