Validating and Repairing Failed LLM JSON Responses
Even with JSON Mode, failures happen: an LLM generates output that doesn't quite match the schema, the API returns an error, or downstream validation detects semantic issues. Production systems need graceful degradation and repair strategies. This guide teaches you how to detect failures, provide targeted retry feedback, and implement fallback logic that keeps your application running.
Validation Failure Modes
JSON Mode guarantees syntactic validity (valid JSON, correct field names, correct types). Semantic failures—where the data is syntactically valid but semantically wrong—still occur:
- Hallucinated values: The LLM invents data not in the source text.
- Partial responses: The LLM truncates output to fit schema constraints (e.g.,
maxLengthis too tight). - Type coercion errors: A number field is returned as a string despite the schema.
- Missing optional fields: The LLM skips optional fields instead of defaulting.
- Out-of-constraint values: A number exceeds the specified range (rare in strict mode, but possible with loose constraints).
Detecting Validation Failures
Pydantic Validation (Python)
from pydantic import BaseModel, ValidationError, Field
from openai import OpenAI
client = OpenAI()
class SentimentAnalysis(BaseModel):
sentiment: str = Field(..., pattern="^(positive|negative|neutral)$")
confidence: float = Field(..., ge=0, le=1)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Analyze sentiment of: 'Great product!'"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "SentimentAnalysis",
"schema": SentimentAnalysis.model_json_schema(),
"strict": True
}
}
)
try:
result = SentimentAnalysis.model_validate_json(response.choices[0].message.content)
except ValidationError as e:
print(f"Validation failed: {e}")
# Handle: retry, return default, escalate
Zod Validation (TypeScript)
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import { OpenAI } from "openai";
const client = new OpenAI();
const SentimentSchema = z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
confidence: z.number().min(0).max(1)
});
type Sentiment = z.infer<typeof SentimentSchema>;
const response = await client.chat.completions.create({
model: "gpt-4-turbo",
messages: [{ role: "user", content: "Analyze sentiment..." }],
response_format: {
type: "json_schema",
json_schema: {
name: "SentimentAnalysis",
schema: zodToJsonSchema(SentimentSchema),
strict: true
}
}
});
const result = SentimentSchema.safeParse(
JSON.parse(response.choices[0].message.content)
);
if (!result.success) {
console.error("Validation failed:", result.error.errors);
// Handle: retry, return default, escalate
}
Implementing Retry Logic
Retry with exponential backoff, providing explicit error feedback to the LLM:
import json
import time
from pydantic import BaseModel, ValidationError
def get_structured_output(
prompt: str,
schema: type[BaseModel],
model: str = "gpt-4-turbo",
max_retries: int = 3,
backoff_factor: float = 2.0
):
"""
Attempt to get valid structured output from an LLM.
Retries with exponential backoff and error feedback.
"""
retry_count = 0
last_error = None
while retry_count < max_retries:
try:
# Construct the request
messages = [{"role": "user", "content": prompt}]
# On retries, add error context
if retry_count > 0:
error_msg = f"Previous attempt failed: {last_error}. Please ensure all fields match the schema."
messages.append({"role": "assistant", "content": "I'll provide corrected output."})
messages.append({"role": "user", "content": error_msg})
response = client.chat.completions.create(
model=model,
messages=messages,
response_format={
"type": "json_schema",
"json_schema": {
"name": schema.__name__,
"schema": schema.model_json_schema(),
"strict": True
}
}
)
# Validate
result = schema.model_validate_json(response.choices[0].message.content)
return result
except (ValidationError, json.JSONDecodeError) as e:
last_error = str(e)
retry_count += 1
if retry_count < max_retries:
wait_time = backoff_factor ** (retry_count - 1)
print(f"Validation failed. Retrying in {wait_time}s... (attempt {retry_count}/{max_retries})")
time.sleep(wait_time)
else:
print(f"Max retries exceeded after {max_retries} attempts.")
raise
raise RuntimeError(f"Failed to get valid output after {max_retries} retries")
# Usage
class ProductReview(BaseModel):
product_name: str
rating: int = Field(..., ge=1, le=5)
try:
review = get_structured_output(
prompt="Extract product review from: 'Best laptop ever, 5 stars!'",
schema=ProductReview
)
except Exception as e:
print(f"Failed to extract review: {e}")
# Return default or escalate
Fallback Strategies
When retries fail, implement graceful fallbacks:
from typing import Optional
def get_structured_output_with_fallback(
prompt: str,
schema: type[BaseModel],
fallback_factory: callable = None,
max_retries: int = 3
) -> BaseModel:
"""
Attempt structured output; fall back to default/escalate on failure.
"""
try:
return get_structured_output(prompt, schema, max_retries=max_retries)
except Exception as e:
print(f"Structured output failed: {e}")
# Strategy 1: Return default instance
if fallback_factory:
print("Using fallback factory...")
return fallback_factory()
# Strategy 2: Return partial result (nullify invalid fields)
print("Attempting partial validation...")
try:
raw = json.loads(json.loads(response.choices[0].message.content))
# Keep valid fields, set invalid to None
validated = schema.model_validate(raw, from_attributes=True)
return validated
except:
pass
# Strategy 3: Escalate
print("All fallback strategies failed. Escalating to human review.")
raise
# Usage with fallback
def default_review():
return ProductReview(product_name="Unknown", rating=3)
review = get_structured_output_with_fallback(
prompt="Extract review...",
schema=ProductReview,
fallback_factory=default_review,
max_retries=2
)
Semantic Validation (Post-Parse)
After parsing JSON, validate semantic correctness:
from pydantic import BaseModel, Field, field_validator
class EmailAddress(BaseModel):
recipient_name: str
recipient_email: str
subject: str = Field(..., max_length=100)
@field_validator("recipient_email")
@classmethod
def validate_email(cls, v):
if "@" not in v:
raise ValueError("Invalid email format")
return v
@field_validator("subject")
@classmethod
def check_subject_non_empty(cls, v):
if len(v.strip()) == 0:
raise ValueError("Subject cannot be empty")
return v
# This fails semantic validation even if JSON is syntactically valid
try:
email = EmailAddress(
recipient_name="Alice",
recipient_email="not-an-email", # No @ symbol
subject=" " # Whitespace-only subject
)
except ValidationError as e:
print(f"Semantic validation failed: {e}")
Detecting and Handling Hallucination
Some failures are semantic: the LLM invents plausible-sounding but false data. Detect this with source verification:
from pydantic import BaseModel, Field
class ExtractedEntity(BaseModel):
name: str
description: str
source_quote: str = Field(
...,
description="Direct quote from source text where this entity was mentioned"
)
def extract_entities_with_verification(source_text: str, max_retries: int = 3):
"""
Extract entities and verify they are grounded in the source text.
"""
for attempt in range(max_retries):
entities = get_structured_output(
prompt=f"Extract entities from: '{source_text}'",
schema=ExtractedEntity
)
# Verify each entity is mentioned in source
invalid_entities = []
for entity in entities:
if entity.source_quote.lower() not in source_text.lower():
invalid_entities.append(entity.name)
if not invalid_entities:
return entities # All verified
# Retry with explicit instruction
print(f"Attempt {attempt + 1}: Found hallucinated entities: {invalid_entities}")
if attempt < max_retries - 1:
# Re-prompt with stricter instructions
continue
# Return only verified entities
return [e for e in entities if e.source_quote.lower() in source_text.lower()]
Error Reporting and Observability
Log failures for monitoring and improvement:
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def log_validation_failure(
prompt: str,
error: str,
attempt_number: int,
schema_name: str
):
"""Log validation failures for analysis."""
logger.error(
f"Validation failure - Schema: {schema_name}, Attempt: {attempt_number}, Error: {error[:100]}",
extra={
"prompt_length": len(prompt),
"timestamp": datetime.now().isoformat(),
"full_error": error
}
)
# Usage in retry loop
try:
result = schema.model_validate_json(response)
except ValidationError as e:
log_validation_failure(prompt, str(e), retry_count, schema.__name__)
Key Takeaways
- JSON Mode guarantees syntactic validity but not semantic correctness; implement post-parse validation.
- Detect failures with Pydantic's
ValidationErroror Zod's.safeParse()result. - Implement exponential-backoff retry with explicit error feedback to the LLM.
- Provide graceful fallbacks: default values, partial validation, or human escalation.
- Verify semantic correctness with custom validators and source-grounding checks.
- Log all validation failures for monitoring, debugging, and prompt improvement.
- Distinguish between transient failures (retry) and persistent schema misalignment (redesign).
Frequently Asked Questions
When should I retry vs. using a fallback?
Retry if the error is transient (temporary API issue) or if the LLM might produce a valid response with adjusted prompting. Use fallback if the schema is incompatible with the LLM's reasoning or if retries are exhausted.
How many retries is reasonable?
Typically 2–3. Beyond that, the issue is usually schema design or prompt clarity, not transient failure. Further retries waste cost and latency.
Should I log all validation failures?
Yes, at least a sample. Unexpected validation patterns reveal schema design issues or LLM limitations. Use structured logging (JSON format) for downstream analysis.
Can I use partial validation (skip invalid fields)?
Yes, but carefully. Skipping a field that you later assume exists can cause runtime errors. Explicitly mark partial results as such and handle accordingly in downstream code.
How do I test error handling?
Mock the LLM to return invalid JSON or out-of-schema responses. Test retry logic, fallback paths, and logging separately.
def test_validation_failure():
# Mock response with invalid JSON
with patch("openai.ChatCompletion.create") as mock_create:
mock_create.return_value = MagicMock(
choices=[MagicMock(message=MagicMock(content="{invalid json"))]
)
with pytest.raises(json.JSONDecodeError):
get_structured_output("test", schema)