Advanced JSON Schema: Enums, Conditionals, and Composition
Beyond basic types and constraints, JSON Schema supports powerful composition and conditional patterns. Enums enforce fixed vocabularies, discriminators route complex types, conditionals enable field dependencies, and definitions eliminate duplication. These advanced patterns let you express subtle domain constraints and make your schemas self-documenting.
Enums: Controlled Vocabularies
Enums lock in valid values, preventing hallucination and ensuring downstream code receives only expected strings.
Simple enum:
{
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["pending", "approved", "rejected", "archived"]
}
}
}
Numeric enum (for codes):
{
"type": "object",
"properties": {
"http_status": {
"type": "integer",
"enum": [200, 201, 400, 404, 500]
}
}
}
Pydantic enum pattern (Python):
from enum import Enum
from pydantic import BaseModel
class DocumentStatus(str, Enum):
DRAFT = "draft"
REVIEW = "review"
PUBLISHED = "published"
ARCHIVED = "archived"
class Document(BaseModel):
title: str
status: DocumentStatus
# LLM output
response = client.chat.completions.create(...)
doc = Document.model_validate_json(response.choices[0].message.content)
# doc.status is known to be one of the four values
Zod enum pattern (TypeScript):
import { z } from "zod";
const DocumentSchema = z.object({
title: z.string(),
status: z.enum(["draft", "review", "published", "archived"])
});
type Document = z.infer<typeof DocumentSchema>;
// status is "draft" | "review" | "published" | "archived"
Discriminated Unions: Type-Based Routing
A discriminated union uses one field to determine the schema of the entire object. Common in workflows with conditional logic.
Example: Different response types based on query result
{
"type": "object",
"oneOf": [
{
"type": "object",
"properties": {
"result_type": {"type": "string", "const": "success"},
"data": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["result_type", "data", "confidence"]
},
{
"type": "object",
"properties": {
"result_type": {"type": "string", "const": "error"},
"error_code": {"type": "integer"},
"message": {"type": "string"}
},
"required": ["result_type", "error_code", "message"]
},
{
"type": "object",
"properties": {
"result_type": {"type": "string", "const": "unknown"},
"reason": {"type": "string"}
},
"required": ["result_type", "reason"]
}
]
}
Pydantic discriminated union:
from typing import Union, Literal
from pydantic import BaseModel, Field
class SuccessResult(BaseModel):
result_type: Literal["success"]
data: str
confidence: float
class ErrorResult(BaseModel):
result_type: Literal["error"]
error_code: int
message: str
class UnknownResult(BaseModel):
result_type: Literal["unknown"]
reason: str
QueryResult = Union[SuccessResult, ErrorResult, UnknownResult]
class QueryResponse(BaseModel):
response: QueryResult = Field(..., discriminator="result_type")
# Usage
response_data = {
"response": {
"result_type": "success",
"data": "Found 42 results",
"confidence": 0.95
}
}
response = QueryResponse.model_validate(response_data)
if isinstance(response.response, SuccessResult):
print(f"Success: {response.response.data}")
Zod discriminated union:
import { z } from "zod";
const QueryResultSchema = z.discriminatedUnion("result_type", [
z.object({
result_type: z.literal("success"),
data: z.string(),
confidence: z.number().min(0).max(1)
}),
z.object({
result_type: z.literal("error"),
error_code: z.number().int(),
message: z.string()
}),
z.object({
result_type: z.literal("unknown"),
reason: z.string()
})
]);
type QueryResult = z.infer<typeof QueryResultSchema>;
const result: QueryResult = {
result_type: "success",
data: "Found results",
confidence: 0.9
};
Conditionals: Dependent Fields
Use if/then/else to express field dependencies. If one field has a value, other fields become required or change type.
Example: If escalation_required is true, escalation_reason becomes required
{
"type": "object",
"properties": {
"action": {"type": "string"},
"escalation_required": {"type": "boolean"},
"escalation_reason": {"type": "string"}
},
"required": ["action", "escalation_required"],
"if": {
"properties": {
"escalation_required": {"const": true}
}
},
"then": {
"required": ["escalation_reason"]
}
}
Pydantic field_validator for conditional logic:
from pydantic import BaseModel, field_validator
class ActionPlan(BaseModel):
action: str
escalation_required: bool
escalation_reason: str = ""
@field_validator("escalation_reason")
@classmethod
def check_escalation_reason(cls, v, info):
if info.data.get("escalation_required") and not v:
raise ValueError("escalation_reason required when escalation_required is true")
return v
Schema Definitions and Reuse
Use $defs to define reusable schemas and reference them with $ref, eliminating duplication:
{
"$defs": {
"Address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"country": {"type": "string"},
"zip": {"type": "string"}
},
"required": ["street", "city", "country"]
},
"Contact": {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"},
"phone": {"type": "string"}
},
"required": ["email"]
}
},
"type": "object",
"properties": {
"billing_address": {"$ref": "#/$defs/Address"},
"shipping_address": {"$ref": "#/$defs/Address"},
"contact": {"$ref": "#/$defs/Contact"}
},
"required": ["billing_address", "contact"]
}
Pydantic inheritance for composition:
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
country: str
zip: str = ""
class Contact(BaseModel):
email: str
phone: str = ""
class Order(BaseModel):
billing_address: Address
shipping_address: Address
contact: Contact
# Reuse schemas across multiple models
class Vendor(BaseModel):
name: str
contact: Contact
address: Address
Recursive Schemas: Trees and Graphs
Define schemas that reference themselves to model hierarchical data:
{
"$defs": {
"Document": {
"type": "object",
"properties": {
"title": {"type": "string"},
"content": {"type": "string"},
"sections": {
"type": "array",
"items": {"$ref": "#/$defs/Document"}
}
}
}
},
"$ref": "#/$defs/Document"
}
Pydantic recursive model:
from typing import Optional
from pydantic import BaseModel
class Document(BaseModel):
title: str
content: str
sections: list["Document"] = []
# Enable recursive reference
Document.model_rebuild()
# Usage: Document can contain nested Documents
doc = Document(
title="Main",
content="...",
sections=[
Document(title="Section 1", content="..."),
Document(title="Section 2", content="...", sections=[
Document(title="Subsection", content="...")
])
]
)
Constrained Arrays with Item Dependencies
Require certain items in an array or constrain item relationships:
{
"type": "object",
"properties": {
"ordered_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product_id": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1},
"price_per_unit": {"type": "number", "minimum": 0}
},
"required": ["product_id", "quantity", "price_per_unit"]
},
"minItems": 1,
"maxItems": 100,
"uniqueItems": false
}
}
}
Pydantic with array constraints:
from pydantic import BaseModel, Field
class OrderItem(BaseModel):
product_id: str
quantity: int = Field(..., ge=1)
price_per_unit: float = Field(..., ge=0)
class Order(BaseModel):
items: list[OrderItem] = Field(..., min_length=1, max_length=100)
@property
def total(self) -> float:
return sum(item.quantity * item.price_per_unit for item in self.items)
Pattern Validation: Format and Regex
Use format for standard patterns (email, URI, date) or pattern for custom regex:
{
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
},
"website": {
"type": "string",
"format": "uri"
},
"phone": {
"type": "string",
"pattern": "^\\+?[1-9]\\d{1,14}$"
},
"sku": {
"type": "string",
"pattern": "^[A-Z]{3}-\\d{6}$"
}
}
}
Pydantic patterns:
from pydantic import BaseModel, EmailStr, Field
from typing import Annotated
class Product(BaseModel):
sku: Annotated[str, Field(pattern=r"^[A-Z]{3}-\d{6}$")]
email: EmailStr
phone: Annotated[str, Field(pattern=r"^\+?[1-9]\d{1,14}$")]
Composition Example: Complex Pricing Model
Combine multiple patterns into a sophisticated schema:
from typing import Union, Literal
from pydantic import BaseModel, Field
class FixedPrice(BaseModel):
price_type: Literal["fixed"]
amount: float = Field(..., gt=0)
class PercentageDiscount(BaseModel):
price_type: Literal["percentage"]
base_amount: float = Field(..., gt=0)
discount_percent: int = Field(..., ge=1, le=100)
class TieredPrice(BaseModel):
price_type: Literal["tiered"]
tiers: list[dict] = Field(
...,
description="List of {quantity_min, quantity_max, price_per_unit}"
)
PricingStrategy = Union[FixedPrice, PercentageDiscount, TieredPrice]
class Product(BaseModel):
name: str
pricing: PricingStrategy = Field(..., discriminator="price_type")
# The LLM now chooses a pricing strategy type and fills in the corresponding fields
Testing Advanced Schemas
Test discriminated unions and conditional fields:
import pytest
from pydantic import ValidationError
def test_discriminated_union():
"""Test that discriminator selects the correct schema."""
# Valid success response
success = QueryResponse.model_validate({
"response": {
"result_type": "success",
"data": "Results found",
"confidence": 0.9
}
})
assert isinstance(success.response, SuccessResult)
# Valid error response
error = QueryResponse.model_validate({
"response": {
"result_type": "error",
"error_code": 404,
"message": "Not found"
}
})
assert isinstance(error.response, ErrorResult)
# Invalid discriminator value
with pytest.raises(ValidationError):
QueryResponse.model_validate({
"response": {
"result_type": "invalid",
"data": "Something"
}
})
def test_conditional_validation():
"""Test dependent field validation."""
# Valid: escalation not required, no reason needed
action1 = ActionPlan(
action="resolve",
escalation_required=False
)
assert action1.escalation_reason == ""
# Valid: escalation required, reason provided
action2 = ActionPlan(
action="escalate",
escalation_required=True,
escalation_reason="Needs approval"
)
assert action2.escalation_reason == "Needs approval"
# Invalid: escalation required, but no reason
with pytest.raises(ValidationError):
ActionPlan(
action="escalate",
escalation_required=True,
escalation_reason=""
)
Key Takeaways
- Enums enforce vocabularies and eliminate hallucination; use them wherever valid values are finite.
- Discriminated unions (oneOf + discriminator) route complex types based on a single field.
- Conditional schemas (if/then) express field dependencies and make relationships explicit.
$defsand$refeliminate schema duplication and improve maintainability.- Recursive schemas model trees, hierarchies, and nested structures naturally.
- Combine patterns (enums + conditionals + composition) to express sophisticated domain constraints.
- Test all schema patterns; discriminators and conditionals are easy to get wrong.
Frequently Asked Questions
When should I use oneOf vs. discriminatedUnion?
Use discriminatedUnion (with an explicit discriminator field) when one field determines the schema. Use oneOf for unions without a discriminator (rarer for LLM outputs). Discriminators are more efficient because the schema validator can select the right branch immediately.
Can I have optional fields within a discriminated union?
Yes. Each union variant defines its own required array. Some variants can have fewer required fields.
class SimplePath(BaseModel):
path_type: Literal["simple"]
steps: list[str]
class DetailedPath(BaseModel):
path_type: Literal["detailed"]
steps: list[str]
estimated_time: str = "" # Optional
complexity: str = "" # Optional
Do LLMs understand recursive schemas?
Yes, but keep recursion depth shallow (2–3 levels). Very deep recursion confuses LLMs. Use maxDepth or limit nesting in descriptions.
How do I handle schema evolution (adding fields)?
Use additionalProperties: true during a transition period, then tighten to false. For required fields, add them as optional first, then make them required in the next version.
Can I validate cross-field constraints (e.g., startDate < endDate)?
Yes, with Pydantic @root_validator or model-level custom logic. JSON Schema's conditional support is limited; use code validation for complex constraints.
from pydantic import root_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@root_validator
def check_dates(cls, values):
if values["start_date"] > values["end_date"]:
raise ValueError("start_date must be before end_date")
return values