Skip to main content

JSON Schema Validation: Complete Tutorial

JSON Schema is the industry standard for describing and validating JSON data. Unlike Pydantic (which is Python-only), JSON Schema works with any language and is natively supported by LLM APIs like OpenAI's and Anthropic's. This article teaches you JSON Schema fundamentals, validates real LLM outputs, and builds production-grade validation pipelines.

I've used JSON Schema across TypeScript, Python, and Go backends. It's the lingua franca for talking about structure across teams and APIs. Mastering it is essential for reliability engineering.

JSON Schema Basics

A JSON Schema is a JSON object that describes valid JSON. The simplest schema is {} (any JSON is valid). A more useful schema specifies type and constraints:

{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age"]
}

This schema says: "An object with a required string name, a non-negative integer age, and an optional email string formatted as an email address."

Core Keywords

JSON Schema uses keywords to describe constraints:

  • type: string, integer, number, boolean, array, object, null
  • properties: An object mapping field names to schemas
  • required: An array of required field names
  • minimum/maximum: Numeric bounds
  • minLength/maxLength: String length bounds
  • pattern: Regex pattern (e.g., ^[A-Z]+$)
  • enum: Allowed values (e.g., ["red", "green", "blue"])
  • items: Schema for array elements
  • additionalProperties: Whether extra fields are allowed

Validating LLM Outputs with jsonschema

Use the Python jsonschema library:

import json
import jsonschema

# Define schema for an extracted customer record
customer_schema = {
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string", "minLength": 1},
"email": {"type": "string", "format": "email"},
"account_type": {"type": "string", "enum": ["free", "pro", "enterprise"]},
"created_at": {"type": "string", "format": "date-time"}
},
"required": ["id", "name", "email", "account_type"],
"additionalProperties": False
}

# LLM output
llm_output = """
{
"id": 12345,
"name": "Alice Chen",
"email": "[email protected]",
"account_type": "pro",
"created_at": "2026-01-15T10:30:00Z"
}
"""

# Validate
output = json.loads(llm_output)
try:
jsonschema.validate(output, customer_schema)
print("Valid!")
except jsonschema.ValidationError as e:
print(f"Invalid: {e.message}")

If the LLM returns "account_type": "premium" (not in enum), validation fails with a clear error message. If it omits the email field, validation fails because email is required.

Arrays and Nested Objects

JSON Schema shines with complex nested structures. For an array of products:

product_list_schema = {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number", "minimum": 0},
"in_stock": {"type": "boolean"}
},
"required": ["name", "price"],
"additionalProperties": False
},
"minItems": 1,
"maxItems": 100
}
},
"required": ["products"]
}

# LLM output
output = {
"products": [
{"name": "Laptop", "price": 999.99, "in_stock": True},
{"name": "Mouse", "price": 29.99, "in_stock": False}
]
}

jsonschema.validate(output, product_list_schema) # Valid

Advanced Patterns: anyOf, oneOf, allOf

JSON Schema supports logical operators for complex constraints:

oneOf: Exactly one of the schemas must match (useful for discriminated unions):

payment_schema = {
"type": "object",
"oneOf": [
{
"type": "object",
"properties": {"type": {"const": "credit_card"}, "last_four": {"type": "string"}},
"required": ["type", "last_four"]
},
{
"type": "object",
"properties": {"type": {"const": "paypal"}, "email": {"type": "string"}},
"required": ["type", "email"]
}
]
}

# Valid: either credit card or PayPal, but not both
output1 = {"type": "credit_card", "last_four": "4242"}
output2 = {"type": "paypal", "email": "[email protected]"}

allOf: All schemas must match (composition):

tagged_person_schema = {
"allOf": [
{
"type": "object",
"properties": {"name": {"type": "string"}},
"required": ["name"]
},
{
"type": "object",
"properties": {"tags": {"type": "array", "items": {"type": "string"}}},
"required": ["tags"]
}
]
}

# Valid: must have both name and tags
output = {"name": "Alice", "tags": ["ai", "ml"]}

Practical Validation Pipeline

Build a reusable validation wrapper:

from typing import Optional, Tuple

def validate_llm_output(
output: str,
schema: dict
) -> Tuple[bool, Optional[dict], Optional[str]]:
"""
Validate LLM JSON output against a schema.
Returns: (is_valid, parsed_output, error_message)
"""
try:
parsed = json.loads(output)
except json.JSONDecodeError as e:
return False, None, f"Invalid JSON: {e}"

try:
jsonschema.validate(parsed, schema)
return True, parsed, None
except jsonschema.ValidationError as e:
return False, None, f"Schema validation failed: {e.message}"

# Usage
schema = {...}
is_valid, output, error = validate_llm_output(llm_response, schema)
if is_valid:
process(output)
else:
print(f"Validation error: {error}")
# Retry with corrective feedback or fall back

JSON Schema in LLM APIs

Modern LLM providers accept JSON Schema to guide generation. With OpenAI's JSON mode or Claude's with_tool_use, you can pass your schema and increase compliance:

import anthropic

client = anthropic.Anthropic()

schema = {
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["sentiment", "confidence"]
}

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=256,
messages=[{
"role": "user",
"content": f"Analyze sentiment. Return JSON matching this schema: {json.dumps(schema)}\n\nText: {text}"
}]
)

Comparison: jsonschema vs Pydantic for JSON Schema

ToolStrengthsWeaknessesBest For
jsonschemaSpec-compliant, language-agnostic, works with dictsMore verbose, less PythonicCross-language teams, LLM APIs
PydanticPythonic, concise, excellent error messagesPython-only, different specPython backends, rapid development

Use both: define your schema in JSON Schema for API contracts, then use Pydantic for your Python backend code.

Key Takeaways

  • JSON Schema is the standard for describing valid JSON across all languages.
  • Core keywords like type, properties, required, and enum cover most validation needs.
  • Use oneOf, allOf, and anyOf for complex discriminated unions and composition.
  • Always wrap JSON parsing and schema validation in error handling.
  • Feed your schema to the LLM in prompts for better compliance.

Frequently Asked Questions

How do I handle conditional schemas (if X, then Y)?

Use if/then/else keywords in JSON Schema. Example: if account_type is "enterprise", require a purchase_order field.

Can JSON Schema validate business logic (e.g., price > 0)?

Yes, with minimum, maximum, pattern, and custom constraints. For complex logic, parse and validate in code.

Does jsonschema library support all JSON Schema features?

Check the library version. As of 2026, most libraries support Draft 2020-12. For the latest features, use jsonschema >= 4.18.

How do I handle LLM outputs with extra fields?

Use "additionalProperties": false to reject them, or "additionalProperties": true to allow them. The default (if omitted) is to allow.

Further Reading