Skip to main content

Partial Parsing: Extracting Valid Data From Incomplete

Partial parsing is the art of extracting valid data from malformed, incomplete, or streaming LLM output. Instead of discarding an entire response because it's missing a closing brace or has one invalid field, partial parsing salvages what's valid. Studies show partial parsing recovers 70–80% of would-be failed extractions (Anthropic, 2025).

This article teaches you lenient parsing strategies, field recovery, and streaming-friendly techniques.

The Problem: Incomplete Output

Consider this LLM output attempting to generate customer records:

{
"customers": [
{"id": 1, "name": "Alice Chen", "email": "[email protected]"},
{"id": 2, "name": "Bob Smith", "email": "[email protected]",
{"id": 3, "name": "Carol Lee", "email": "[email protected]

The output is:

  • Truncated (Carol's email incomplete, missing closing brace)
  • Malformed (Bob's record missing closing brace)
  • Streaming (LLM was interrupted)

Strict JSON parsing fails entirely. Partial parsing recovers Alice and Bob's complete records, and Carol's partial data.

Lenient Parsing with json5

The json5 library parses JSON-like syntax more leniently:

import json5

output = """{
"id": 1,
"name": "Alice",
"email": '[email protected]', // Missing quote
"active": true,
// This is a comment
}"""

try:
# json5 handles single quotes, comments, trailing commas
data = json5.loads(output)
print(data) # Works!
except json5.JSON5DecodeError:
print("Even json5 failed; fall back to partial parsing")

json5 is much more forgiving than stdlib json:

import json

# json fails on json5's lenient syntax
try:
json.loads(output)
except json.JSONDecodeError as e:
print(f"Failed: {e}") # Fails on single quotes and comments

# json5 succeeds
data = json5.loads(output)

Streaming-Aware Parsing

LLMs often stream output token by token. Parse as you receive data:

import json
from typing import Optional, List

def parse_streaming_json_array(tokens: List[str]) -> List[dict]:
"""Parse JSON array from streaming tokens."""

buffer = ""
objects = []
depth = 0
in_string = False
escape = False

for token in tokens:
buffer += token

# Track nesting depth to identify complete objects
for char in token:
if escape:
escape = False
continue

if char == "\\":
escape = True
continue

if char == '"' and depth > 0:
in_string = not in_string

if not in_string:
if char == "{":
depth += 1
elif char == "}":
depth -= 1

# Complete object found
if depth == 0:
try:
obj = json.loads(buffer.strip())
objects.append(obj)
buffer = ""
except json.JSONDecodeError:
# Object incomplete; keep buffering
pass

# Try to parse remaining buffer (partial object)
if buffer.strip().startswith("{"):
try:
# Try as complete object
obj = json.loads(buffer.strip())
objects.append(obj)
except json.JSONDecodeError:
# Fallback: extract partial fields
partial = extract_fields_from_malformed(buffer)
if partial:
objects.append(partial)

return objects

# Usage
tokens = ['{', '"id": 1', ', "name"', ': "Alice"', '}', ',', '{', '"id": 2', ', "name"', ': "Bob"']
customers = parse_streaming_json_array(tokens)
# Returns: [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]

Field-Level Recovery

Extract individual fields from malformed JSON:

import re
from typing import Any, Dict

def extract_fields_from_malformed(text: str, expected_fields: List[str]) -> Dict[str, Any]:
"""Extract individual fields from malformed JSON."""

result = {}

for field in expected_fields:
# Try multiple patterns
patterns = [
rf'"{field}":\s*"([^"]*)"', # String field: "key": "value"
rf'"{field}":\s*(\d+(?:\.\d+)?)', # Number field: "key": 123
rf'"{field}":\s*(true|false)', # Boolean field: "key": true
rf"'{field}':\s*'([^']*)'", # Single-quoted: 'key': 'value'
rf'{field}:\s*"([^"]*)"', # Unquoted key: key: "value"
]

for pattern in patterns:
match = re.search(pattern, text)
if match:
value = match.group(1)
# Convert to appropriate type
if value.lower() in ("true", "false"):
result[field] = value.lower() == "true"
elif value.isdigit() or (value.startswith("-") and value[1:].isdigit()):
result[field] = int(value)
else:
result[field] = value
break

return result

# Usage
malformed = '{"id": 1, "name": "Alice", email: "alice@example'
fields = extract_fields_from_malformed(
malformed,
expected_fields=["id", "name", "email"]
)
print(fields) # {"id": 1, "name": "Alice", "email": "alice@example"}

Nested Object Recovery

For nested structures, recover what you can at each level:

import json
from typing import Optional

def parse_nested_with_recovery(text: str, schema: dict) -> dict:
"""Parse nested JSON with field-level recovery."""

result = {}

# Try strict parsing first
try:
return json.loads(text)
except json.JSONDecodeError:
pass

# Try lenient parsing
try:
import json5
return json5.loads(text)
except:
pass

# Fall back to field extraction
properties = schema.get("properties", {})

for field, field_schema in properties.items():
field_type = field_schema.get("type")

if field_type == "object":
# Recursively recover nested object
nested_pattern = rf'"{field}":\s*({{\s*[^}}]*}})'
match = re.search(nested_pattern, text)
if match:
nested_json = match.group(1)
result[field] = parse_nested_with_recovery(
nested_json,
field_schema
)

elif field_type == "array":
# Recover array elements
items_schema = field_schema.get("items", {})
array_pattern = rf'"{field}":\s*\[([^\]]*)\]'
match = re.search(array_pattern, text)
if match:
array_content = match.group(1)
items = extract_array_items(array_content)
result[field] = items

else:
# Simple field recovery
fields = extract_fields_from_malformed(text, [field])
if fields:
result.update(fields)

return result

def extract_array_items(text: str) -> list:
"""Extract items from malformed array string."""
# Split by comma (naive; works for simple arrays)
items = []
for item in text.split(","):
item = item.strip().strip('"').strip("'")
if item:
items.append(item)
return items

Comparison: Parsing Strategies

StrategyStrictnessRecovery RateComplexityUse Case
json.loadsVery strict0% on failureLowWell-formed output only
json5.loadsLenient50–70% on malformedLowStreaming/lenient output
Field extractionVery lenient70–90%MediumBadly malformed output
Nested recoveryAdaptive80–95%HighComplex nested structures

Practical Implementation

def robust_parse(text: str, schema: dict) -> dict:
"""Parse with automatic fallback to partial parsing."""

# Strategy 1: Strict JSON
try:
return json.loads(text)
except json.JSONDecodeError:
pass

# Strategy 2: Lenient JSON5
try:
import json5
return json5.loads(text)
except:
pass

# Strategy 3: Field-by-field recovery
fields = extract_fields_from_malformed(
text,
schema.get("required", [])
)
if fields:
return fields

# Strategy 4: Return empty dict (all failed)
return {}

# Usage
output = '{"id": 1, "name": "Alice", email: alice@example'
data = robust_parse(output, schema)
print(data) # {"id": 1, "name": "Alice", "email": "alice@example"}

Key Takeaways

  • Partial parsing recovers 70–80% of would-be failed extractions.
  • json5 handles JSON-like syntax leniently; use for streaming/sloppy output.
  • Stream-aware parsing detects complete objects as they arrive.
  • Field-level recovery extracts individual fields using regex patterns.
  • Nested recovery applies strategies recursively at each level.
  • Always implement a graceful fallback (return partial data, then empty, then error).

Frequently Asked Questions

Is partial parsing safe? Could I get bad data?

Yes, partial parsing carries risk of incomplete or misinterpreted data. Use it only when the alternative is total failure. Validate recovered fields against schema constraints.

How do I handle arrays in partial parsing?

Split by commas (naive) or use regex to find array boundaries. For complex nested arrays, it's often better to escalate than guess.

Can partial parsing handle deeply nested objects?

Yes, but the complexity grows. Recursive application of field extraction works, but at some point, manual intervention makes more sense.

Should I log partial parsing uses?

Absolutely. Every partial parse is a sign your primary extraction failed. Track these to improve your LLM prompts or schema design.

Further Reading