Skip to main content

OpenTelemetry for LLMs: Instrument AI Chains

OpenTelemetry (OTel) is a vendor-neutral standard for instrumenting applications to emit logs, traces, and metrics. Unlike proprietary tracing platforms that lock you into their API, OpenTelemetry provides a unified interface: you instrument your code once, then route telemetry to any backend (Jaeger, Datadog, New Relic, Grafana Loki) by changing configuration. For LLM applications, OpenTelemetry enables auto-instrumentation of popular libraries (LangChain, LlamaIndex, httpx) and allows you to add custom spans for LLM-specific operations like token counting or embedding model calls.

OpenTelemetry Architecture

OpenTelemetry consists of three layers:

  1. SDKs (language-specific implementations) — Python's opentelemetry-api and opentelemetry-sdk provide the tracer, meter, and logger APIs.
  2. Instrumentation libraries — Auto-instrumentation for popular frameworks (e.g., opentelemetry-instrumentation-requests for the requests HTTP library).
  3. Exporters — Send telemetry to backends (Jaeger, Otlp, Datadog, etc.).

The flow is: your code emits spans via the OTel API, the SDK buffers them, an exporter batches and sends them to a backend.

Setting Up OpenTelemetry for Python

Install dependencies:

pip install opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-jaeger-thrift \
opentelemetry-instrumentation \
opentelemetry-instrumentation-requests \
opentelemetry-instrumentation-anthropic

Initialize OpenTelemetry in your application:

from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.anthropic import AnthropicInstrumentor

# Create a resource to identify your service
resource = Resource(attributes={
SERVICE_NAME: "llm-chatbot",
"service.version": "1.0.0",
"deployment.environment": "production"
})

# Create Jaeger exporter (sends spans to localhost:6831)
jaeger_exporter = JaegerExporter(
agent_host_name="localhost",
agent_port=6831
)

# Configure trace provider with exporter
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(
BatchSpanProcessor(jaeger_exporter)
)
trace.set_tracer_provider(tracer_provider)

# Auto-instrument libraries
RequestsInstrumentor().instrument() # HTTP calls via requests library
AnthropicInstrumentor().instrument() # Anthropic API calls

# Get tracer
tracer = trace.get_tracer(__name__)

Now any HTTP request via the requests library automatically creates a span. When you call Anthropic's API, a span is created automatically.

Adding Custom Spans for LLM Operations

For LLM-specific operations not covered by auto-instrumentation, create custom spans:

from anthropic import Anthropic

def query_with_context(user_message: str, context_docs: list[str]):
"""LLM query with context retrieval, instrumented with custom spans."""

with tracer.start_as_current_span("query_with_context") as root_span:
root_span.set_attribute("input_length", len(user_message))

# Custom span for context assembly
with tracer.start_as_current_span("assemble_context") as span:
context_text = "\n".join(context_docs)
span.set_attribute("context_length", len(context_text))
span.set_attribute("doc_count", len(context_docs))

# Custom span for prompt construction
with tracer.start_as_current_span("construct_prompt") as span:
system_prompt = f"You are a helpful assistant. Use the following context:\n{context_text}"
prompt = f"{system_prompt}\n\nUser: {user_message}"
span.set_attribute("prompt_length", len(prompt))

# Auto-instrumented Anthropic call (creates a span automatically)
client = Anthropic()
response = client.messages.create(
model="claude-3-opus-20250219",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": user_message}]
)

# Custom span for post-processing
with tracer.start_as_current_span("parse_response") as span:
result_text = response.content[0].text
span.set_attribute("output_length", len(result_text))
span.set_attribute("stop_reason", response.stop_reason)

return result_text

The root span encompasses the entire operation. Child spans for context assembly, prompt construction, and response parsing are automatically recorded by OpenTelemetry. When you view this trace in Jaeger, you see the full breakdown:

query_with_context (500 ms)
├─ assemble_context (10 ms)
├─ construct_prompt (5 ms)
├─ anthropic.messages.create (450 ms) [auto-instrumented]
└─ parse_response (20 ms)

Exporting to Multiple Backends

To send traces to multiple backends simultaneously, add multiple exporters:

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader

# Jaeger exporter (for traces)
jaeger_exporter = JaegerExporter(agent_host_name="localhost", agent_port=6831)

# OTLP exporter (for traces and metrics to a central collector)
otlp_exporter = OTLPSpanExporter(
endpoint="http://localhost:4317", # OpenTelemetry Collector endpoint
insecure=True
)

# Add both exporters to tracer provider
tracer_provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# Prometheus exporter (for metrics)
prometheus_reader = PrometheusMetricReader()
meter_provider = MeterProvider(
metric_readers=[prometheus_reader],
resource=resource
)
metrics.set_meter_provider(meter_provider)

Now traces go to both Jaeger (for visualization) and the OpenTelemetry Collector (which can forward to Datadog, Splunk, or other backends). Metrics are exposed to Prometheus.

Sampling Strategies

Sampling reduces storage and processing costs by recording only a percentage of traces. OpenTelemetry provides several sampling strategies:

from opentelemetry.sdk.trace.sampling import (
ProbabilitySampler,
ParentBasedSampler,
SamplingResult
)

class AdaptiveSampler:
"""Sample 100% of errors and slow traces; 5% of fast traces."""

def should_sample(self, sampling_input):
# Always sample if parent is sampled (head-based propagation)
if sampling_input.trace_state:
return SamplingResult(decision=True)

# Otherwise, sample based on latency and error status
# (This would require intercepting at export time for real latency.)
return SamplingResult(decision=True) if sampling_input.attributes.get("is_error") else SamplingResult(decision=False, trace_state=sampling_input.trace_state)

# Use the sampler
sampler = ParentBasedSampler(root=ProbabilitySampler(0.05)) # 5% of root traces
tracer_provider = TracerProvider(sampler=sampler, resource=resource)

A typical strategy: sample 100% of errors, 100% of traces slower than p99 latency, and 1–5% of successful fast traces. This ensures you see failures while keeping storage reasonable.

Baggage: Passing Context Through Spans

Baggage allows you to attach key-value context that propagates across span boundaries and services. For example, pass user_id or session_id through all spans:

from opentelemetry.baggage import set_baggage, get_baggage

def process_user_request(user_id: str, request_body: dict):
"""Process a user request with baggage context."""

# Set baggage (context propagated through all child spans)
set_baggage("user_id", user_id)
set_baggage("request_id", request_body.get("id"))

with tracer.start_as_current_span("process_request"):
# child_span automatically has access to baggage
llm_result = query_with_context(
user_message=request_body["message"],
context_docs=request_body["docs"]
)
return llm_result

# In a child function, retrieve baggage:
def child_operation():
user_id = get_baggage("user_id")
request_id = get_baggage("request_id")
# Use user_id and request_id in this operation
pass

Baggage is automatically included in exported traces, so you can query: "All spans for user_id=42" even if you didn't explicitly set the user ID in every span.

Key Takeaways

  • OpenTelemetry is a vendor-neutral standard for instrumentation; you write code once and route telemetry to any backend by changing configuration.
  • Auto-instrumentation libraries emit spans automatically for popular frameworks (HTTP libraries, LangChain, LlamaIndex).
  • Custom spans instrument LLM-specific operations (context assembly, prompt construction, response parsing).
  • Multiple exporters allow you to send traces to Jaeger, Datadog, and Prometheus simultaneously without code changes.
  • Sampling strategies (1–5% of successful traces, 100% of errors) reduce storage costs while ensuring visibility into failures.
  • Baggage propagates user ID, request ID, and other context through all spans automatically.

Frequently Asked Questions

Do I need to run Jaeger locally to use OpenTelemetry?

No. You can export to a managed service (Datadog, New Relic, Honeycomb) by changing the exporter configuration. For local development, run Jaeger in Docker: docker run -d -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one. Then navigate to http://localhost:16686 to view traces.

What is the performance overhead of OpenTelemetry instrumentation?

Negligible if you use asynchronous exporters (batch processing, background threads). Overhead is typically under 1% of request latency for well-tuned configurations. Sampling helps further: if you sample 5% of traces, 95% of requests incur no instrumentation overhead.

Can I instrument third-party LLM libraries like LangChain?

Yes. OpenTelemetry provides auto-instrumentation for LangChain via opentelemetry-instrumentation-langchain. You simply call LangChainInstrumentor().instrument() and all LangChain operations automatically emit spans.

What if I want to export to multiple backends with different sampling rates?

Use multiple exporters with different samplers, or implement a custom exporter that routes spans to different backends based on attributes (e.g., route errors to PagerDuty, success traces to Jaeger).

Further Reading