Finite-State Machines for Controlled LLM Decoding
Finite-state machines (FSMs) are a powerful formalism for enforcing sequential state constraints in LLM generation. Where regex and JSON schema work best for single, acyclic outputs, FSMs excel at multi-step processes: filling out a form field by field, navigating dialogue branches, executing a workflow with predefined states, or generating code that respects control-flow structure. An FSM defines states (e.g., "wait_for_name", "wait_for_age", "confirm_submission") and transitions (e.g., "name_provided" takes you from state 1 to state 2), and the decoder ensures output never violates these transitions.
FSM-based constraints are particularly useful for AI agents that must follow predictable interaction patterns, where the model cannot "jump" between unrelated states or skip mandatory steps. By enforcing state transitions at the token level (via logit masking), the model is guided through the intended workflow even if it tries to deviate.
FSM Fundamentals: States and Transitions
A finite-state machine consists of:
- States: A finite set of discrete conditions (e.g., "awaiting_input", "processing", "done").
- Transitions: Rules that move from one state to another, triggered by input events or conditions.
- Start state: Where the process begins.
- Accept states: Where the process can end (optional; not all FSMs require a unique end state).
Example: A login workflow FSM
States: [prompt_username, prompt_password, verify_credentials, authenticate_success, authenticate_failure]
Transitions:
prompt_username -> prompt_password (on: username_provided)
prompt_password -> verify_credentials (on: password_provided)
verify_credentials -> authenticate_success (on: credentials_valid)
verify_credentials -> authenticate_failure (on: credentials_invalid)
authenticate_failure -> prompt_username (on: retry)
authenticate_success -> [ACCEPT] (terminal)
In decoding, the FSM guides the model's token generation. At each position, only tokens that trigger valid transitions are allowed.
Building FSMs for LLM Generation
When designing FSMs for LLM output, think of states as "what kind of content should come next?" and transitions as "tokens that move to the next state."
Example 1: Form-filling workflow
States:
S0: start
S1: awaiting_name (model generates name)
S2: awaiting_age (model generates age)
S3: awaiting_confirmation (model outputs yes/no)
S4: complete
Transitions (in code or diagram):
S0 --["name is: ]--> S1
S1 --[any text ending with newline]--> S2
S2 --["age is: "]--> S2
S2 --[any digits ending with newline]--> S3
S3 --["Confirm? yes/no: "]--> S3
S3 --["yes" or "no"]--> S4
S4 --> [ACCEPT]
The decoder maintains the current state. At each token, it masks logits for tokens that don't lead to valid next states. If the model tries to output "age is..." while in S1 (awaiting name), those tokens are masked out.
Example 2: API response dialogue FSM
States:
S_start: begin
S_query: user sent query (model should generate acknowledgement)
S_think: model is thinking (generate reasoning)
S_answer: model provides answer
S_done: finished
Transitions:
S_start --[user_message]--> S_query
S_query --["<think>"]--> S_think
S_think --["</think>"]--> S_answer
S_answer --[response_text]--> S_answer (can keep answering)
S_answer --["<end>"]--> S_done
This enforces a pattern: acknowledge → think → answer → close. The model can't skip thinking or answer before acknowledging.
Representing FSMs in Code
Text representation (for documentation):
FORMAT: CSV or JSON
state_id, state_name, transitions (JSON)
0, start, {"event": "begin", "next_state": 1}
1, input_mode, {"event": "text_received", "next_state": 2}
2, process_mode, {"event": "done", "next_state": 3}
3, output_mode, {"event": "end", "next_state": 4}
4, end, {}
Python representation:
from dataclasses import dataclass
from typing import Dict, List
@dataclass
class State:
id: int
name: str
transitions: Dict[str, int] # event -> next_state_id
class FSM:
def __init__(self, states: List[State], start_state_id: int):
self.states = {s.id: s for s in states}
self.current_state = start_state_id
def can_transition(self, event: str) -> bool:
"""Check if event is a valid transition from current state."""
current = self.states[self.current_state]
return event in current.transitions
def transition(self, event: str):
"""Move to next state if event is valid."""
if self.can_transition(event):
current = self.states[self.current_state]
self.current_state = current.transitions[event]
else:
raise ValueError(
f"Invalid event '{event}' from state {self.current_state}"
)
# Define states
states = [
State(0, "start", {"begin": 1}),
State(1, "await_name", {"name_provided": 2}),
State(2, "await_age", {"age_provided": 3}),
State(3, "await_confirm", {"confirmed": 4}),
State(4, "done", {}),
]
fsm = FSM(states, start_state_id=0)
fsm.transition("begin") # Now in state 1
fsm.transition("name_provided") # Now in state 2
FSMs in Constrained Decoding
The decoder integrates FSM state tracking:
def generate_with_fsm(model, tokenizer, prompt, fsm):
"""
Generate tokens while respecting FSM state transitions.
Args:
model: LLM model
tokenizer: Tokenizer for vocab lookups
prompt: Initial prompt
fsm: FSM instance defining valid transitions
Returns:
Generated text respecting FSM structure
"""
token_ids = tokenizer.encode(prompt)
current_state = fsm.current_state
while True:
# Forward pass
with torch.no_grad():
outputs = model(torch.tensor([token_ids]))
logits = outputs.logits[0, -1, :]
# Compute valid transitions from current state
valid_transitions = fsm.states[current_state].transitions.keys()
# Mask invalid tokens (those not in valid transitions)
valid_token_ids = set()
for event in valid_transitions:
# Assume events map to token sequences
event_tokens = tokenizer.encode(event)
valid_token_ids.update(event_tokens)
mask = torch.zeros(len(tokenizer), dtype=torch.bool)
mask[list(valid_token_ids)] = True
# Apply mask and sample
masked_logits = logits.clone()
masked_logits[~mask] = float('-inf')
next_token = torch.multinomial(
torch.softmax(masked_logits, dim=-1),
num_samples=1
).item()
# Update state if transition is complete
next_token_text = tokenizer.decode([next_token])
# Check if next_token_text completes any event
for event, next_state in fsm.states[current_state].transitions.items():
if is_event_completed(token_ids, event):
current_state = next_state
fsm.current_state = current_state
token_ids.append(next_token)
# Stop at terminal state or max length
if current_state == len(fsm.states) - 1 or len(token_ids) > max_tokens:
break
return tokenizer.decode(token_ids)
FSM Examples: Real-World Use Cases
Use case 1: Customer support ticket classification
# States: question received → classify category → suggest solution → rate satisfaction
fsm_states = [
State(0, "start", {"question_received": 1}),
State(1, "classify", {"billing": 2, "technical": 2, "account": 2}),
State(2, "suggest", {"solution_provided": 3}),
State(3, "rate", {"satisfied": 4, "unsatisfied": 5}),
State(4, "end_satisfied", {}),
State(5, "escalate_to_human", {}),
]
# Constrains output: must classify before suggesting,
# must suggest before rating, must end with satisfaction level
Use case 2: Structured data extraction
# States: field1 → field2 → field3 → validation → done
extraction_fsm = [
State(0, "extract_field_1", {"field1_done": 1}),
State(1, "extract_field_2", {"field2_done": 2}),
State(2, "extract_field_3", {"field3_done": 3}),
State(3, "validate", {"valid": 4, "invalid": 1}), # Can loop back
State(4, "complete", {}),
]
# Ensures fields extracted in order, with optional validation retry
Use case 3: Dialogue with required handshake
# States: wait_for_user → acknowledge → process → respond → wait_for_next
dialogue_fsm = [
State(0, "idle", {"user_input": 1}),
State(1, "received", {"send_ack": 2}),
State(2, "acknowledged", {"process": 3}),
State(3, "processing", {"send_response": 4}),
State(4, "responded", {"await_input": 0}),
]
# Ensures model acknowledges before answering, responds before waiting
FSM Visualization and Debugging
A well-designed FSM is easier to debug if visualized:
digraph finite_state_machine {
rankdir=LR;
node [shape=circle];
start [shape=point];
start -> Q0;
Q0 [label="await_name"];
Q1 [label="await_age"];
Q2 [label="await_confirm"];
Q3 [shape=doublecircle, label="done"];
Q0 -> Q1 [label="name_provided"];
Q1 -> Q2 [label="age_provided"];
Q2 -> Q3 [label="confirmed"];
Q2 -> Q1 [label="invalid_input", style=dashed]; # Retry
}
Tools like Graphviz can render these diagrams. Testing FSMs:
def test_fsm_path(fsm, event_sequence):
"""Verify a sequence of events follows FSM rules."""
for event in event_sequence:
if not fsm.can_transition(event):
print(f"Invalid event {event} from state {fsm.current_state}")
return False
fsm.transition(event)
return True
# Test valid paths
assert test_fsm_path(fsm, ["begin", "name_provided", "age_provided", "confirmed"])
# Test invalid paths
assert not test_fsm_path(fsm, ["begin", "age_provided"]) # Skip name
FSM Limitations and Alternatives
Limitations:
- State explosion: A complex workflow can require hundreds of states, making the FSM hard to manage.
- Limited context: FSMs are Markovian (depend only on current state, not history). If you need to remember earlier choices, you need more states.
- Token-level mismatch: FSM events map to tokens, but token boundaries might not align with semantic events.
Alternatives:
- Hierarchical FSMs: Break a large FSM into nested smaller FSMs.
- Pushdown automata: Add a stack to FSMs for recursive structures.
- Hybrid approach: Use FSM + grammar (e.g., states define which grammar rule applies).
Key Takeaways
- FSMs define states and transitions to enforce sequential workflows in LLM generation.
- The decoder tracks the current state and masks tokens that don't correspond to valid transitions.
- FSMs are ideal for multi-step processes (forms, dialogues, workflows) where the model must follow a specific order.
- Typical FSM constraints add 5–15% generation overhead, far less than complex grammars.
- FSMs can be visualized (Graphviz), tested (event sequences), and composed (nested) for complex systems.
Frequently Asked Questions
How do I map tokens to FSM events?
Tokens are the LLM's output; events are semantic meanings. A transition might be triggered by a sequence of tokens (e.g., the phrase "name is John" triggers the "name_provided" event). In code, check if the generated sequence matches an event pattern, then update FSM state. This is often handled via a small language model or rule-based classifier.
Can FSMs handle loops (e.g., retry invalid input)?
Yes. A transition can point back to an earlier state. For example, state_2 -> state_1 (on "retry") allows re-attempting step 1. Keep loop counts bounded to avoid infinite generation.
What's the difference between an FSM and a GBNF grammar?
FSMs are state-centric; grammars are production-centric. FSMs are better for high-level workflows; grammars are better for syntax. FSMs can't express nested structures without helper states; grammars can (recursively). In practice, they're often combined: FSM defines workflow states, grammar defines the syntax within each state.
Can I add probabilities to FSM transitions (e.g., "prefer this path")?
Yes, but standard FSMs are unweighted. You can annotate transitions with preference scores and use them to guide sampling (higher-probability transitions are less masked). This is a hybrid approach between unconstrained (free sampling) and fully constrained (binary masks).
How do I test an FSM before deploying it with an LLM?
Unit-test the FSM independently: verify valid and invalid event sequences, check state transitions, visualize the graph. Then test with a few generated examples using your LLM, verifying the model respects the state transitions. Use logs to trace state changes during generation.
Further Reading
- Introduction to Finite Automata (Theory) — Formal background.
- Graphviz: FSM Visualization — Tools for drawing state diagrams.
- Dialogue State Tracking in Conversational AI — How FSMs apply to multi-turn dialogue.
- Outlines FSM Documentation — Integration with Outlines library.