Outlines Library: Build Grammar-Constrained Generators
Outlines is a Python library that makes constrained generation accessible to developers without deep knowledge of finite-state machines or logit masking. It provides a simple API to enforce JSON schemas, regex patterns, and GBNF grammars on any HuggingFace-compatible model or OpenAI API. Under the hood, Outlines handles grammar compilation, FSM construction, token masking, and sampling, freeing you to focus on your application logic.
Outlines is ideal for building reliable AI systems: form-filling agents, API integrators, code generators, and data extraction pipelines. With Outlines, you specify the output format once and get guaranteed valid, structured generation without retry loops or post-processing hacks.
Installation and Setup
# Install Outlines
pip install outlines
# Optional: Install transformers for HuggingFace models
pip install transformers torch
# For OpenAI API support
pip install openai
Verify installation:
import outlines
print(f"Outlines version: {outlines.__version__}")
Basic Usage: JSON Schema
The simplest use case is JSON generation with a schema:
from outlines import models, generate
from pydantic import BaseModel
# Define output schema
class Invoice(BaseModel):
customer_name: str
total_amount: float
item_count: int
paid: bool
# Load a model
model = models.transformers("mistralai/Mistral-7B-v0.1")
# Create a JSON generator
generator = generate.json(model, Invoice)
# Generate JSON
prompt = "Generate an invoice for 5 items totaling $250 from Alice Smith."
invoice = generator(prompt, max_tokens=200)
print(invoice)
# Output: {"customer_name": "Alice Smith", "total_amount": 250.0, "item_count": 5, "paid": false}
The output is guaranteed to be:
- Valid JSON (syntactically correct).
- Schema-compliant (matches
Invoicestructure).
Regex Constraints
For pattern-based output (emails, codes, dates):
from outlines import models, generate
model = models.transformers("mistralai/Mistral-7B-v0.1")
# Constraint: output must be a valid email
email_regex = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
email_generator = generate.constrained(model, regex=email_regex)
prompt = "Extract the email from: Contact John at [email protected] for inquiries."
email = email_generator(prompt, max_tokens=50)
print(email)
# Output: [email protected] (guaranteed to match regex)
More regex examples:
# Phone number (XXX-XXX-XXXX)
phone_regex = r"^\d{3}-\d{3}-\d{4}$"
phone_gen = generate.constrained(model, regex=phone_regex)
# ISO date
date_regex = r"^\d{4}-\d{2}-\d{2}$"
date_gen = generate.constrained(model, regex=date_regex)
# Hex color code
color_regex = r"^#[0-9a-fA-F]{6}$"
color_gen = generate.constrained(model, regex=color_regex)
GBNF Grammar Constraints
For complex structures, use GBNF:
from outlines import models, generate
model = models.transformers("mistralai/Mistral-7B-v0.1")
# SQL SELECT statement (safe subset)
sql_grammar = r"""
root := "SELECT " columns " FROM " table_name (" WHERE " condition)?
columns := column (" ," column)*
column := identifier
table_name := identifier
condition := identifier " = " literal
identifier := [a-zA-Z_][a-zA-Z0-9_]*
literal := "'" [^']* "'" | [0-9]+
"""
sql_generator = generate.constrained(model, grammar=sql_grammar)
prompt = "Generate a SQL query to find users from the 'customers' table."
sql = sql_generator(prompt, max_tokens=100)
print(sql)
# Output: SELECT name FROM customers (guaranteed to match grammar)
Sampling and Temperature
Constrained generation supports temperature and sampling strategies:
from outlines import models, generate
model = models.transformers("mistralai/Mistral-7B-v0.1")
class Decision(BaseModel):
choice: str # Will be constrained
generator = generate.json(model, Decision)
# Greedy (deterministic)
result_greedy = generator(
"Pick a color: red or blue",
max_tokens=50,
temperature=0.0, # Greedy
seed=42
)
# Sampled (probabilistic)
result_sample = generator(
"Pick a color: red or blue",
max_tokens=50,
temperature=0.7, # Diverse sampling
seed=42
)
print(f"Greedy: {result_greedy}")
print(f"Sampled: {result_sample}")
Temperature works with constraints: it controls randomness among valid tokens only.
Batch Generation
For efficiency with multiple prompts:
from outlines import models, generate
from pydantic import BaseModel
class Response(BaseModel):
answer: str
confidence: float
model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.json(model, Response)
prompts = [
"Is the sky blue? Answer with confidence 0-1.",
"Is water wet? Answer with confidence 0-1.",
"Is fire hot? Answer with confidence 0-1.",
]
results = [generator(p, max_tokens=100) for p in prompts]
for prompt, result in zip(prompts, results):
print(f"{prompt} -> {result}")
For large-scale generation, use vllm backend (via models.vllm()) for better throughput.
Custom Grammar from Files
Store complex grammars in files for reusability:
File: my_grammar.gbnf
root := json_value
json_value := object | array | string | number | "true" | "false" | "null"
object := "{" ws "}" | "{" ws member ("," ws member)* ws "}"
member := string ws ":" ws json_value
array := "[" ws "]" | "[" ws json_value ("," ws json_value)* ws "]"
string := "\"" [^"]* "\""
number := "-"? [0-9] ("." [0-9]+)? ([eE] [+-]? [0-9]+)?
ws := ([ \t\n])*
Python code:
from outlines import models, generate
model = models.transformers("mistralai/Mistral-7B-v0.1")
with open("my_grammar.gbnf", "r") as f:
grammar = f.read()
generator = generate.constrained(model, grammar=grammar)
result = generator("Output JSON data:", max_tokens=200)
print(result)
Integration with OpenAI API
Outlines also works with OpenAI's API:
from outlines import models, generate
from pydantic import BaseModel
class Answer(BaseModel):
reasoning: str
conclusion: str
# Use OpenAI API
model = models.openai("gpt-4")
generator = generate.json(model, Answer)
response = generator(
"Explain why water boils.",
max_tokens=200,
temperature=0.7
)
print(response)
Note: OpenAI's JSON mode is built-in; Outlines wraps it for consistency.
Real-World Example: Multi-Turn Data Extraction
Suppose you're building a chatbot that extracts structured data across multiple turns:
from outlines import models, generate
from pydantic import BaseModel
from typing import List
class PersonInfo(BaseModel):
name: str
age: int
email: str
phone: str
tags: List[str] = []
model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.json(model, PersonInfo)
# Simulate multi-turn conversation
conversation = [
"Extract person info: Name is John.",
"Age is 30.",
"Email: [email protected].",
"Phone: 555-123-4567.",
"Tags: engineer, python.",
]
# Accumulate info across turns
full_context = "\n".join(conversation)
result = generator(full_context, max_tokens=300)
print(result)
# Output: {"name": "John", "age": 30, "email": "[email protected]", "phone": "555-123-4567", "tags": ["engineer", "python"]}
Error Handling and Validation
Even with constraints, validate generated output:
from outlines import models, generate
from pydantic import BaseModel, ValidationError
import json
class Config(BaseModel):
host: str
port: int
debug: bool
model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.json(model, Config)
try:
result = generator("Generate config: host=localhost, port=8080, debug=true.", max_tokens=150)
config = Config.model_validate_json(result) # Double-check
print(f"Valid config: {config}")
except (json.JSONDecodeError, ValidationError) as e:
print(f"Error: {e}")
With constraints, validation should never fail, but defensive coding is good practice.
Performance Tips
1. Use GPU acceleration:
import torch
model = models.transformers(
"mistralai/Mistral-7B-v0.1",
device="cuda" # Use GPU if available
)
2. Smaller models are faster:
# Faster (3B parameters)
model_fast = models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Slower but more capable (70B parameters)
model_capable = models.transformers("meta-llama/Llama-2-70b-hf")
3. Batch processing:
# Process multiple examples at once
prompts = ["Example 1", "Example 2", "Example 3"]
results = [generator(p, max_tokens=100) for p in prompts] # Can optimize with batching
4. Simpler constraints are faster:
# Fast: simple JSON
generator_fast = generate.json(model, SimpleSchema)
# Slower: complex grammar
generator_slow = generate.constrained(model, grammar=complex_grammar)
Debugging Constraints
If constraint generation fails or is too slow:
from outlines import models, generate
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.json(model, MySchema)
# Logs will show FSM compilation, masking, etc.
result = generator("Prompt", max_tokens=100)
Key Takeaways
- Outlines is a Python library that abstracts away logit masking, FSM compilation, and grammar handling.
- Use
generate.json()for schemas,generate.constrained(regex=...)for patterns, andgenerate.constrained(grammar=...)for complex structures. - Temperature and sampling work with constraints; only valid tokens are reachable.
- Outlines supports both HuggingFace models and OpenAI API for seamless integration.
- Typical performance overhead is 10–25%; optimize by using smaller models, simpler grammars, and batch processing.
Frequently Asked Questions
Does Outlines modify the model weights?
No. Constraints are applied during decoding (inference time) only, via logit masking. The model itself remains unchanged. You can switch between constrained and unconstrained generation with the same model instance.
Can I use Outlines with fine-tuned models?
Yes. Any HuggingFace-compatible model (base, fine-tuned, or instruction-tuned) works with Outlines. Fine-tuned models may generate constrained output even faster if they're trained on structured examples.
What if my constraint is too restrictive and the model can't satisfy it?
The decoder will output the longest valid partial match and then halt. For example, if a regex allows only "red" or "blue" but the model insists on "green", it outputs "r" (the longest valid prefix) and stops. Design constraints carefully by testing with unconstrained generation first.
Does Outlines support streaming output?
Yes, via generator.stream():
generator = generate.json(model, Schema)
for token in generator.stream(prompt, max_tokens=100):
print(token, end="", flush=True)
Streaming works with constraints; tokens are masked in real-time.
Can I combine multiple constraints (e.g., regex AND JSON)?
Directly, no. But you can nest them: use a GBNF grammar with regex character classes and JSON objects. Alternatively, generate JSON first, then validate against a regex in post-processing.
Further Reading
- Outlines Official Documentation — Complete API reference and examples.
- Outlines GitHub Repository — Source code, issues, discussions.
- Constrained Decoding Benchmark (Paper) — Performance analysis of constraint techniques.
- HuggingFace Model Hub — Repository of models compatible with Outlines.