Usage Metering for AI: Track & Bill Usage
You cannot build a sustainable AI SaaS without metering usage and calculating costs. When a user generates a 1000-token response, you incur a cost (typically $0.01 per 1M tokens with Claude). You must track this, attribute it to the correct user/organization, and bill them fairly. This article covers token counting, cost calculation, metering events, and integration with billing platforms like Stripe.
What is Usage Metering?
Usage metering is the automated process of measuring customer activity (in this case, LLM token consumption) in real-time and accumulating costs. Instead of a flat monthly subscription, usage-based pricing charges customers based on actual consumption: generate 100K tokens this month, get billed for 100K tokens. Usage metering requires an accurate cost model (knowing the price per token), integration with the LLM provider's cost reporting, and database tracking of user actions. Done right, customers feel they pay for what they use; done wrong, they face surprise bills or chargeback disputes.
Token Counting and Cost Calculation
Counting Input and Output Tokens
Most LLM providers charge separately for input (prompt) and output (completion) tokens. Anthropic charges approximately $3 per 1M input tokens and $15 per 1M output tokens (as of June 2026). Prices vary by model.
# Tracking tokens and calculating costs
import anthropic
# Pricing (update as providers change rates)
PRICING = {
"claude-3-5-sonnet-20241022": {
"input_cost_per_mtok": 3.0, # $3 per 1M tokens
"output_cost_per_mtok": 15.0, # $15 per 1M tokens
},
"claude-3-opus-20250219": {
"input_cost_per_mtok": 15.0, # $15 per 1M tokens
"output_cost_per_mtok": 75.0, # $75 per 1M tokens
}
}
async def generate_with_metering(
prompt: str,
model: str,
organization_id: str
) -> dict:
"""Generate completion and meter tokens and cost."""
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
# Generate completion
message = client.messages.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
# Extract token counts from response
input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens
total_tokens = input_tokens + output_tokens
# Calculate cost
pricing = PRICING.get(model, {})
input_cost = (input_tokens / 1_000_000) * pricing.get("input_cost_per_mtok", 0)
output_cost = (output_tokens / 1_000_000) * pricing.get("output_cost_per_mtok", 0)
total_cost = input_cost + output_cost
# Record metering event
metering_event = UsageEvent(
organization_id=organization_id,
model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
total_tokens=total_tokens,
input_cost_usd=input_cost,
output_cost_usd=output_cost,
total_cost_usd=total_cost,
created_at=datetime.utcnow()
)
db.add(metering_event)
db.commit()
return {
"completion": message.content[0].text,
"tokens": {
"input": input_tokens,
"output": output_tokens,
"total": total_tokens
},
"cost_usd": total_cost
}
Batch Event Aggregation
Rather than updating billing immediately after each request, batch metering events (aggregate every 5 minutes) and update the organization's usage.
# Hourly batch aggregation
from datetime import datetime, timedelta
async def aggregate_usage_events():
"""Aggregate metering events into hourly summaries (runs hourly)."""
one_hour_ago = datetime.utcnow() - timedelta(hours=1)
# Get all unaggregated events from the last hour
events = db.query(UsageEvent).filter(
UsageEvent.created_at >= one_hour_ago,
UsageEvent.aggregated == False
).all()
# Group by organization and model
by_org = {}
for event in events:
key = (event.organization_id, event.model)
if key not in by_org:
by_org[key] = {
"input_tokens": 0,
"output_tokens": 0,
"cost": 0.0,
"count": 0
}
by_org[key]["input_tokens"] += event.input_tokens
by_org[key]["output_tokens"] += event.output_tokens
by_org[key]["cost"] += event.total_cost_usd
by_org[key]["count"] += 1
# Create aggregated records
for (org_id, model), totals in by_org.items():
summary = UsageSummary(
organization_id=org_id,
model=model,
period_start=one_hour_ago,
period_end=datetime.utcnow(),
input_tokens=totals["input_tokens"],
output_tokens=totals["output_tokens"],
total_cost_usd=totals["cost"],
event_count=totals["count"]
)
db.add(summary)
# Mark original events as aggregated
for event in events:
if event.organization_id == org_id and event.model == model:
event.aggregated = True
db.commit()
Integration with Stripe Billing
Creating Metered Billing in Stripe
Stripe's usage-based billing allows you to emit metering events to Stripe, and they calculate the invoice automatically.
# Integration with Stripe metered billing
import stripe
import os
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
async def emit_stripe_metering_event(
organization_id: str,
model: str,
tokens: int,
cost_usd: float
):
"""Report usage to Stripe for metered billing."""
# Look up the organization's Stripe customer ID
org = db.query(Organization).filter(
Organization.id == organization_id
).first()
if not org or not org.stripe_customer_id:
return # Not connected to Stripe yet
# Report usage metric to Stripe
# (Assumes a price created with a metered billing dimension)
stripe.billing.meter_event.create(
event_name="tokens_processed",
identifier=org.stripe_customer_id,
timestamp=int(datetime.utcnow().timestamp()),
value=tokens
)
# Alternative: track cost directly
stripe.billing.meter_event.create(
event_name="llm_cost",
identifier=org.stripe_customer_id,
timestamp=int(datetime.utcnow().timestamp()),
value=int(cost_usd * 100) # Convert dollars to cents
)
Setting Up the Stripe Pricing Model
In Stripe Dashboard, create a product with usage-based pricing:
- Create a product "AI LLM Usage"
- Add a price with billing type "Metered" and dimension "tokens_processed"
- Set the price:
$0.000003per token (for example) - Enable monthly reporting period
# Querying Stripe billing invoices
def get_customer_billing_summary(stripe_customer_id: str) -> dict:
"""Get current month usage and projected invoice."""
# Fetch invoices for this customer
invoices = stripe.Invoice.list(customer=stripe_customer_id, limit=1)
if invoices.data:
latest_invoice = invoices.data[0]
return {
"current_period_start": latest_invoice.period_start,
"current_period_end": latest_invoice.period_end,
"amount_due": latest_invoice.amount_due / 100, # Convert cents to dollars
"status": latest_invoice.status
}
return {}
Preventing Bill Shock with Quotas and Alerts
Users appreciate warnings before they hit unexpected charges. Implement soft limits (warnings) and hard limits (blocking).
# Quota enforcement
async def check_and_enforce_quota(
organization_id: str,
tokens: int,
model: str
) -> dict:
"""Check if the organization is within quota before generating."""
org = db.query(Organization).filter(
Organization.id == organization_id
).first()
if not org:
raise ValueError("Organization not found")
# Calculate projected cost
pricing = PRICING.get(model, {})
projected_cost = (tokens / 1_000_000) * pricing.get("output_cost_per_mtok", 0)
# Get current month usage
month_start = datetime.utcnow().replace(day=1)
month_usage = db.query(func.sum(UsageEvent.total_cost_usd)).filter(
UsageEvent.organization_id == organization_id,
UsageEvent.created_at >= month_start
).scalar() or 0.0
new_total = month_usage + projected_cost
# Define limits
hard_limit = org.monthly_budget_usd or 1000.0
soft_limit = hard_limit * 0.8
if new_total > hard_limit:
# Block request
return {
"allowed": False,
"reason": "Monthly budget exceeded",
"current_usage": month_usage,
"hard_limit": hard_limit
}
elif new_total > soft_limit:
# Allow but warn
return {
"allowed": True,
"warning": f"Approaching budget: ${new_total:.2f} / ${hard_limit:.2f}",
"current_usage": month_usage,
"hard_limit": hard_limit
}
else:
return {"allowed": True}
Billing Models Comparison
| Model | Pros | Cons | Best For |
|---|---|---|---|
| Per-request | Simple, predictable | Users cannot estimate cost | Low-cost endpoints |
| Per-token | Accurate, fair | Complex pricing tiers | AI SaaS (most common) |
| Per-minute of compute | Captures latency costs | Hard to explain to users | Long-running processes |
| Flat monthly | Predictable, easy to sell | No cost incentive to optimize | Freemium tier |
| Hybrid (monthly + overage) | Base revenue + upsell | Complex to implement | Enterprise plans |
Key Takeaways
- Track input and output tokens separately; LLM costs differ significantly.
- Aggregate metering events hourly to reduce database writes and enable accurate usage reporting.
- Integrate with Stripe or another billing platform to automate invoice generation.
- Enforce soft limits (warnings) at 80% of budget and hard limits (blocking) at 100%.
- Display projected costs to users before they submit expensive requests.
Frequently Asked Questions
How do I handle refunds or adjustments?
Store a AdjustmentEvent table that records manual or automatic adjustments (e.g., credited tokens for service issues). When generating invoices, include adjustments. Track the reason for each adjustment for auditing.
What if the user was over-charged due to a bug?
Detect the overcharge by comparing metering records with the LLM provider's own reports (most providers offer monthly cost reports). Issue a credit via Stripe (stripe.CreditNote.create()), email the user explaining the error, and log the incident for investigation.
Should I charge customers while they are in a free trial?
No. Implement a free_trial_until timestamp on the organization. Skip billing events until this date. Log metering events anyway for future analytics.
How do I handle disputes or refund requests?
Provide a refund request form with the date range and reason. Review the metering logs for that period. If valid, issue a credit note in Stripe and explain the policy in your terms.