LLM Cost Optimization: Per-Feature Attribution Guide
Per-feature cost attribution answers the question: how much did it cost to run feature X last month? Without attribution, you see total spend ($50,000) but cannot determine whether that came from high-volume chatbots, expensive reasoning tasks, or inefficient data pipelines. Attribution is the practice of tagging each LLM request with metadata (feature name, user segment, priority level) so that costs can be rolled up and analyzed by those dimensions. A mature cost accounting system lets you query spend by feature, user cohort, time window, and model, surfacing which features are "profit centers" (high ROI, low cost) and which are "cost drains" (high cost, low usage). Then you can make informed trade-offs: invest optimization effort in expensive features, sunset low-ROI features, or negotiate lower unit costs for high-volume features.
Designing an Attribution Schema
An attribution schema is the set of metadata dimensions you log alongside each LLM request. The minimal schema has: feature_name, user_segment, model, environment (production/staging), and request_type. A richer schema adds: priority_level, batch_vs_realtime, cached_vs_fresh, and custom fields relevant to your business (e.g., customer_tier, region, experiment_id). When designing your schema, think about the business questions you want to answer:
- "Which feature consumes 50% of our LLM budget?" → you need
feature_name. - "Do enterprise customers have higher costs per session?" → you need
customer_tier. - "Is staging costing more than production?" → you need
environment. - "Does using cached context reduce cost per request by 30%?" → you need
cached_vs_freshor acache_hitboolean.
Once you define the schema, add each dimension as a field in your cost-logging pipeline (from Article 2). Here is a TypeScript example that extends the logging function:
interface CostEvent {
timestamp: string;
feature_name: string;
user_segment: string;
user_id: string;
model: string;
environment: "production" | "staging";
priority_level: "high" | "standard" | "batch";
cached: boolean;
input_tokens: number;
output_tokens: number;
cost_usd: number;
request_type: string; // e.g., "summarize", "classify", "generate_code"
latency_ms: number;
}
function logCostEvent(event: CostEvent): void {
// Write to database, log aggregator, or streaming pipeline
console.log(JSON.stringify(event));
}
// Example: classify a support ticket
async function classifyTicket(
ticketId: string,
content: string,
customerTier: "free" | "pro" | "enterprise"
): Promise<string> {
const model = "claude-3-5-haiku-20241022"; // Cheap model for classification
const start = Date.now();
// Make request and track cost
const response = await client.messages.create({
model,
max_tokens: 50,
messages: [
{
role: "user",
content: `Classify this support ticket into one category: bug, feature-request, or billing. Ticket: ${content}`,
},
],
});
const latency = Date.now() - start;
const cost =
(response.usage.input_tokens * 0.8 +
response.usage.output_tokens * 4.0) /
1_000_000;
logCostEvent({
timestamp: new Date().toISOString(),
feature_name: "support_ticket_classification",
user_segment: customerTier,
user_id: ticketId,
model,
environment: "production",
priority_level: "standard",
cached: false,
input_tokens: response.usage.input_tokens,
output_tokens: response.usage.output_tokens,
cost_usd: cost,
request_type: "classify",
latency_ms: latency,
});
return response.content[0].text;
}
By logging this attribution data, you enable powerful analytics: "aggregate cost by feature_name and user_segment, group by month" surfaces cost per feature per customer cohort, revealing which feature-segment pairs are expensive and worth optimizing.
Implementing Attribution in Production Systems
In a real system, you'll log attribution across multiple LLM calls and aggregate them. Here is a pattern for a support chatbot with three features (FAQ lookup, ticket classification, ticket suggestions):
import anthropic
import logging
from datetime import datetime
from typing import TypedDict
client = anthropic.Anthropic()
class AttributionContext(TypedDict):
feature_name: str
user_id: str
user_segment: str # e.g., "free", "pro", "enterprise"
environment: str
priority_level: str
# Configure structured logging to JSON
logging.basicConfig(
level=logging.INFO,
format='%(message)s',
handlers=[
logging.FileHandler('cost_events.jsonl'), # Append-only JSON lines
],
)
def log_cost_attribution(
context: AttributionContext,
input_tokens: int,
output_tokens: int,
model: str,
):
"""Log cost event with attribution metadata."""
input_price = 3.0 if "sonnet" in model else 0.8 # Sonnet vs Haiku
output_price = 15.0 if "sonnet" in model else 4.0
cost = (input_tokens * input_price + output_tokens * output_price) / 1_000_000
event = {
"timestamp": datetime.utcnow().isoformat(),
"feature_name": context["feature_name"],
"user_id": context["user_id"],
"user_segment": context["user_segment"],
"environment": context["environment"],
"priority_level": context["priority_level"],
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"model": model,
"cost_usd": round(cost, 6),
}
logging.info(event)
return cost
def support_chatbot_handler(
user_message: str,
user_id: str,
user_segment: str,
):
"""Multi-feature chatbot with per-feature attribution."""
context = AttributionContext(
feature_name="", # Will be set per feature
user_id=user_id,
user_segment=user_segment,
environment="production",
priority_level="standard",
)
# Feature 1: Check if this is an FAQ (cheap, Haiku)
context["feature_name"] = "faq_check"
count_response = client.messages.count_tokens(
model="claude-3-5-haiku-20241022",
messages=[
{
"role": "user",
"content": f"Is this a FAQ question? Respond yes/no: {user_message}",
}
],
)
is_faq_response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=10,
messages=[
{
"role": "user",
"content": f"Is this a FAQ question? Respond yes/no: {user_message}",
}
],
)
log_cost_attribution(
context,
is_faq_response.usage.input_tokens,
is_faq_response.usage.output_tokens,
"claude-3-5-haiku-20241022",
)
# Feature 2: If FAQ, generate FAQ response (Haiku)
if "yes" in is_faq_response.content[0].text.lower():
context["feature_name"] = "faq_response_generation"
faq_response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=300,
messages=[
{
"role": "user",
"content": f"Answer this FAQ: {user_message}",
}
],
)
log_cost_attribution(
context,
faq_response.usage.input_tokens,
faq_response.usage.output_tokens,
"claude-3-5-haiku-20241022",
)
return faq_response.content[0].text
# Feature 3: If not FAQ, route to Sonnet for complex support (expensive)
context["feature_name"] = "complex_support_reasoning"
sonnet_response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{
"role": "user",
"content": f"Help with this support issue: {user_message}",
}
],
)
log_cost_attribution(
context,
sonnet_response.usage.input_tokens,
sonnet_response.usage.output_tokens,
"claude-3-5-sonnet-20241022",
)
return sonnet_response.content[0].text
By logging each feature separately, you can query the logs and discover patterns: "What is the cost distribution across features?" reveals that FAQ responses are cheap but high-volume, while complex support reasoning is expensive but low-volume. You can then decide: invest optimization effort in the complex support pathway (high cost), or batch FAQ generation (high volume, but less ROI).
Aggregating Attribution Data for Analysis
Raw logs are valuable but require aggregation to surface insights. A monthly cost aggregation query might look like:
SELECT
DATE_TRUNC('month', timestamp) AS month,
feature_name,
user_segment,
COUNT(*) AS request_count,
SUM(input_tokens) AS total_input_tokens,
SUM(output_tokens) AS total_output_tokens,
SUM(cost_usd) AS total_cost,
ROUND(SUM(cost_usd) / COUNT(*), 6) AS avg_cost_per_request,
FROM cost_events
GROUP BY month, feature_name, user_segment
ORDER BY total_cost DESC;
This query reveals cost per feature per customer segment per month. A typical result might show:
| Month | Feature | Segment | Requests | Cost | Avg/Request |
|---|---|---|---|---|---|
| 2026-05 | faq_response_generation | free | 45000 | $157.50 | $0.0035 |
| 2026-05 | complex_support_reasoning | enterprise | 800 | $96.00 | $0.120 |
| 2026-05 | faq_response_generation | enterprise | 5000 | $17.50 | $0.0035 |
| 2026-05 | complex_support_reasoning | free | 150 | $18.00 | $0.120 |
This reveals that FAQ responses are cheap per request but volume-driven ($157.50 on 45k requests), while complex reasoning is expensive per request but low-volume ($96 on 800 requests). If your monthly budget is $300, FAQ generation consumes 53%, complex reasoning 32%, leaving room for optimization.
Using Attribution for Cost Control and Alerting
With attribution data, you can set up alerts: "If support_qa costs more than $50/day this month, page an engineer." You can also implement cost budgets per feature: "Allocate $100/month to FAQ generation; if we hit $80, start deduplicating similar questions or switching to a cheaper model." Some teams set hard stops: "If a feature exceeds its monthly budget, disable new requests and page the on-call engineer." Others use soft limits: "Alert at 80% of budget, page at 100%." The key is making cost visible and actionable, not abstract. Attribution enables that.
Key Takeaways
- Tag each LLM request with feature name, user segment, model, environment, and priority level for cost attribution.
- Log at least 80% of production requests with attribution metadata to surface accurate per-feature costs.
- Aggregate cost logs monthly by feature and user segment to identify profit centers and cost drains.
- Set per-feature budgets and alerts based on historical cost data to control spend and trigger optimization.
- Use attribution analysis to decide where to invest optimization effort (high-cost features) and where to seek efficiency (high-volume features).
Frequently Asked Questions
How much overhead does logging attribution add?
Logging is microseconds (1–5 µs per event) and asynchronous in most systems, so it adds negligible latency. The storage cost for one year of 1 million daily requests (365 million events at ~200 bytes each) is roughly 73 GB, cheap to store in any cloud database. The real cost is query time: aggregating a year of logs is slow unless you index on timestamp and feature_name.
Should I attribute costs to individual users or just segments?
For privacy and simplicity, attribute to user segments (free, pro, enterprise) rather than individual user IDs. If you need finer granularity, also log customer cohorts (e.g., "early_adopter", "churned_risk") or regions. Avoid logging sensitive user identifiers (SSN, email) in cost logs.
What if a single request spans multiple features (e.g., classify, then generate)?
Log each feature call separately with its own attribution. If a request chains multiple LLM calls, you get multiple log entries; that's correct. It reveals the cost breakdown of the chain. Some teams also add a request_chain_id to link related calls.
Can I retrofit attribution to existing logs?
If you didn't log attribution initially, you can backfill it if you have the raw API response logs (which include timestamps and model). Write a batch job to cross-reference your application logs with API response logs and infer feature names based on request patterns. It's tedious but doable.
How often should I aggregate and review cost attribution?
Review weekly during active optimization, then monthly once costs stabilize. Set up a weekly email report (top 5 features by cost, cost trends, anomalies) and escalate anything unusual immediately.
Further Reading
- OpenTelemetry Cost Attribution for LLM Systems — Open standard for instrumenting applications; applies to cost tracking.
- Cost Allocation in Cloud Systems — Google Cloud's cost attribution patterns, applicable to LLM costs.
- Stripe's Cost Accounting System — Public insight into a high-scale SaaS cost tracking system.
- DataDog Cost Optimization Guide — Cost monitoring and alerting best practices.