Skip to main content

GBNF Grammars: Formal Grammar for LLMs Tutorial

GBNF (GGML BNF) is a formal grammar notation designed for LLM constrained decoding, widely used in llama.cpp, Outlines, and similar inference engines. Instead of writing ad-hoc regex or schema definitions, GBNF lets you describe syntax rules using Backus-Naur Form (BNF)—a standard for formal language specification. With GBNF, you can define complex nested structures: JSON with specific field types, SQL queries following a safe subset, code in a domain-specific language, or any context-free grammar.

GBNF is more expressive than regex (can handle nested structures) yet more readable than hand-coded state machines. A typical GBNF rule looks like rule := "keyword" ( ":" value | "," value )*, describing repetition, alternation, and sequences. The inference engine compiles these rules into finite-state machines, then uses the machine to enforce valid next tokens at each generation step.

BNF Basics: The Foundation

BNF (Backus-Naur Form) is a notation for context-free grammars. Each rule defines a non-terminal symbol in terms of terminal symbols (literal tokens) and references to other rules:

rule_name := production_1 | production_2 | ...

A | means "or" (alternation); sequences are space-separated; literal strings are in quotes.

Example: A simple expression grammar

expr := number | "(" expr ")"
number := [0-9]+

This says: an expression is either a number or an expression in parentheses; a number is one or more digits.

GBNF extends BNF with character classes ([a-z]), repetition operators (*, +, ?), and whitespace handling. Here's how they work:

NotationMeaningExample
"literal"Exact string"hello" matches the word hello
rule_refReference another ruleexpr references the expr rule
a | bAlternation (or)"yes" | "no" matches either
a bSequence (and)"hello" " " "world" for "hello world"
[a-z]Character class[a-z] matches any lowercase letter
[a-zA-Z]Ranges in classesLowercase or uppercase
a*Zero or more repeats[0-9]* matches "", "5", "123"
a+One or more repeats[0-9]+ matches "5", "123" (not "")
a?Zero or one (optional)"-"?[0-9]+ matches "5" or "-5"
(a b)Grouping("a" "b")* repeats "ab" zero+ times

Writing GBNF Rules: Practical Examples

JSON Schema in GBNF:

json_value := object | array | string | number | "true" | "false" | "null"

object := "{" ws "}" | "{" ws member ("," ws member)* ws "}"
member := string ws ":" ws json_value
array := "[" ws "]" | "[" ws json_value ("," ws json_value)* ws "]"

string := "\"" [^"]* "\""
number := "-"? [0-9] ("." [0-9]+)? ([eE] [+-]? [0-9]+)?
ws := ([ \t\n])*

This defines JSON recursively: an object is {key: value, ...}, an array is [value, ...], and so on. The [^"]* means "any character except quote" (character class negation).

SQL SELECT statement (safe subset):

sql_select := "SELECT" ws column_list ws "FROM" ws table_name ws (where_clause)?

column_list := column ("," ws column)*
column := [a-zA-Z_] [a-zA-Z0-9_]*

table_name := [a-zA-Z_] [a-zA-Z0-9_]*

where_clause := "WHERE" ws condition
condition := identifier ws compare_op ws value
identifier := [a-zA-Z_] [a-zA-Z0-9_]*
compare_op := "=" | ">" | "<" | ">=" | "<=" | "!="
value := string | number

string := "'" [^']* "'"
number := [0-9]+ ("." [0-9]+)?

ws := [ \t\n]*

This grammar restricts output to a safe SQL subset: single SELECT, no JOINs, no subqueries, no DROP/DELETE. The decoder enforces these limits by only allowing tokens that follow the rules.

Function call in a domain-specific language:

function_call := function_name "(" arguments ")"
function_name := [a-zA-Z_] [a-zA-Z0-9_]*
arguments := (expression ("," expression)*)?
expression := number | string | identifier | function_call
identifier := [a-zA-Z_] [a-zA-Z0-9_]*
number := [0-9]+
string := "\"" [^"]* "\""

Implementing GBNF: Compilation to State Machines

When you pass a GBNF grammar to an inference engine, it:

  1. Parses the grammar into an abstract syntax tree (AST).
  2. Constructs a finite-state machine (FSM) where each state represents a position in the grammar, and transitions are labeled with tokens.
  3. Tracks the current state during generation: after emitting a token, advance the FSM to the next state.
  4. Masks invalid transitions: At each position, only tokens that lead to valid FSM states are allowed.

Example: For the JSON number rule number := "-"? [0-9]+, the FSM has states like:

START -> ("-", OPT_MINUS_DONE) | ([0-9], IN_DIGITS)
OPT_MINUS_DONE -> ([0-9], IN_DIGITS)
IN_DIGITS -> ([0-9], IN_DIGITS) | (ACCEPT, END)

Token masking at START allows only "-" or digits. Once "-" is seen, the next token must be a digit. Once a digit is seen, the next token can be another digit or end-of-number. This is how formal rules become token constraints.

GBNF in Outlines and llama.cpp

Using GBNF in Outlines (Python):

from outlines import models, generate

# Define a JSON schema as GBNF
json_schema = '''
json_value := object | array | string | number | "true" | "false" | "null"
object := "{" ws "}" | "{" ws member ("," ws member)* ws "}"
member := string ws ":" ws json_value
array := "[" ws "]" | "[" ws json_value ("," ws json_value)* ws "]"
string := "\"" [^"]* "\""
number := "-"? [0-9] ("." [0-9]+)? ([eE] [+-]? [0-9]+)?
ws := ([ \t\n])*
'''

model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.constrained(model, regex=json_schema)

result = generator(
"Extract the person's info as JSON: "
"John Smith, age 30, from New York.",
max_tokens=100
)
print(result)
# Output: guaranteed valid JSON matching the grammar

Using GBNF in llama.cpp (C++):

./main -m model.gguf \
-p "Output a JSON object with name and age: " \
--grammar-file my_grammar.gbnf \
-n 100

Where my_grammar.gbnf contains the grammar rules. The C++ decoder enforces the grammar at token level, guaranteeing valid output.

Common GBNF Patterns

Enums (restricted choice):

color := "red" | "green" | "blue"
decision := "approve" | "reject" | "pending"

Lists with separators:

csv_list := item ("," item)*
item := [a-zA-Z0-9_]+

Optional fields:

config := "name" ":" string ("," "value" ":" number)?
string := "\"" [^"]* "\""
number := [0-9]+

Nested structures:

tree := "node" "(" value "," tree "," tree ")" | "leaf"
value := [0-9]+

Debugging GBNF: Common Pitfalls

Empty rules: A rule that matches empty string (e.g., optional := production? with zero productions) can cause the FSM to accept termination unexpectedly.

Ambiguity: A grammar like rule := "a" | "a" "b" is ambiguous at the token level—"a" could mean the first alternative or the start of the second. Most implementations use the first match, but this can be confusing.

Excessive backtracking: Deep nesting (e.g., nested parentheses or JSON arrays) requires states for each level. Very deep structures slow generation.

Character class limitations: Some inference engines don't support all regex features in character classes. Test with your engine.

Key Takeaways

  • GBNF is a formal grammar notation (BNF extended with character classes and repetition) for defining output syntax.
  • Rules use alternation (|), sequence (whitespace), and operators (*, +, ?) to describe valid token sequences.
  • The inference engine compiles GBNF into finite-state machines, tracking state during generation and masking invalid next tokens.
  • GBNF is more expressive than regex (handles nested structures) and more readable than state machines, making it ideal for complex formats (JSON, SQL, code).
  • Outlines and llama.cpp natively support GBNF; you provide the grammar file and the engine enforces it automatically.

Frequently Asked Questions

Can GBNF represent any grammar or language?

GBNF represents context-free grammars (CFGs), which are powerful but not Turing-complete. Most practical formats (JSON, SQL, code) are context-free. Context-sensitive languages (e.g., "n copies of 'a' followed by n copies of 'b'") require more powerful formalisms. For LLMs, CFGs are usually sufficient.

How large can a GBNF grammar be?

Typical grammars are 50–500 lines. Very large grammars (5,000+ lines) are possible but compile slowly and can increase per-token latency. Modularize by breaking into separate concerns (expression grammar, statement grammar, etc.) and nest rules.

What's the difference between GBNF and regular expressions?

Regex is flat and sequential; it can't handle nested structures (e.g., balanced parentheses). GBNF is recursive; you define rules that reference each other, enabling nesting. For simple patterns (phone numbers, email), regex is faster; for complex structures (JSON, SQL), GBNF is necessary.

Can I mix GBNF with unconstrained generation?

Some engines allow hybrid approaches: mark parts of the grammar as "free-form" (unconstrained). For example, string := "\"" free_text "\"" lets the model generate any characters inside quotes. This preserves structural guarantees while allowing flexibility where needed. Check your engine's documentation.

Is GBNF the same across all inference engines?

The core syntax is standard (based on GGML), but extensions vary. Outlines has slight differences from llama.cpp. Before deploying, test your grammar against your engine to ensure compatibility.

Further Reading