Skip to main content

Handling Nested Objects and Arrays in LLM JSON

Most real-world extraction tasks require nested structures: customer profiles with multiple addresses, documents with sections and subsections, lists of entities with attributes. Nesting in JSON Mode is straightforward once you understand the schema patterns and a few pitfalls to avoid.

Basic Nested Objects

A nested object is simply an object within an object. In JSON Schema, use a property with type: object and declare its own properties and required fields.

Schema for customer with address:

{
"type": "object",
"properties": {
"name": {"type": "string"},
"contact": {
"type": "object",
"properties": {
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["email", "phone"]
}
},
"required": ["name", "contact"]
}

Example LLM API call:

from openai import OpenAI
import json

client = OpenAI()

schema = {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"contact": {
"type": "object",
"properties": {
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["email", "phone"],
"additionalProperties": False
}
},
"required": ["customer_name", "contact"],
"additionalProperties": False
}

response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{
"role": "user",
"content": "Extract contact info from: 'Contact Alice at [email protected] or 555-0123'"
}
],
response_format={
"type": "json_schema",
"json_schema": {"name": "Customer", "schema": schema, "strict": True}
}
)

result = json.loads(response.choices[0].message.content)
print(result["customer_name"])
print(result["contact"]["email"]) # Access nested property

Output:

{
"customer_name": "Alice",
"contact": {
"email": "[email protected]",
"phone": "555-0123"
}
}

Arrays of Simple Types

An array field is declared with type: array and an items schema describing each element.

Schema with multiple tags:

{
"type": "object",
"properties": {
"document_id": {"type": "string"},
"tags": {
"type": "array",
"items": {"type": "string"},
"max_items": 10,
"description": "List of topic tags; max 10"
}
},
"required": ["document_id", "tags"]
}

Example:

schema = {
"type": "object",
"properties": {
"article_title": {"type": "string"},
"keywords": {
"type": "array",
"items": {"type": "string"},
"min_items": 1,
"max_items": 8
}
},
"required": ["article_title", "keywords"]
}

response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{
"role": "user",
"content": "Extract title and keywords from: 'How to Learn Python for Data Science'"
}
],
response_format={
"type": "json_schema",
"json_schema": {"name": "Article", "schema": schema, "strict": True}
}
)

result = json.loads(response.choices[0].message.content)
print(result["article_title"])
print(result["keywords"]) # ["python", "data science", "learning", ...]

Arrays of Objects (Most Common)

The most useful pattern is an array of objects. Each item in the array is an object with its own schema.

Schema for a list of extracted entities:

{
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string", "enum": ["person", "company", "location", "product"]},
"mention_count": {"type": "integer", "minimum": 1}
},
"required": ["name", "type", "mention_count"],
"additionalProperties": False
},
"min_items": 1,
"max_items": 50
}
},
"required": ["entities"]
}

Example API call:

schema = {
"type": "object",
"properties": {
"named_entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"entity": {"type": "string"},
"entity_type": {"type": "string", "enum": ["person", "organization", "location"]},
"context": {"type": "string", "maxLength": 100}
},
"required": ["entity", "entity_type"],
"additionalProperties": False
},
"max_items": 10
}
},
"required": ["named_entities"]
}

response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{
"role": "user",
"content": "Extract named entities from: 'Apple CEO Tim Cook announced a new product in San Francisco.'"
}
],
response_format={
"type": "json_schema",
"json_schema": {"name": "EntityExtraction", "schema": schema, "strict": True}
}
)

result = json.loads(response.choices[0].message.content)
for entity in result["named_entities"]:
print(f"{entity['entity']} ({entity['entity_type']})")

Output:

{
"named_entities": [
{
"entity": "Apple",
"entity_type": "organization",
"context": "CEO Tim Cook announced a new product"
},
{
"entity": "Tim Cook",
"entity_type": "person",
"context": "CEO of Apple"
},
{
"entity": "San Francisco",
"entity_type": "location",
"context": "announcement location"
}
]
}

Deeply Nested Structures

You can nest objects within arrays within objects. However, each level adds complexity and error risk. Prefer shallow structures where possible.

Example: Document with sections containing subsections:

{
"type": "object",
"properties": {
"document_title": {"type": "string"},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"section_title": {"type": "string"},
"subsections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"heading": {"type": "string"},
"content": {"type": "string", "maxLength": 500}
},
"required": ["heading", "content"]
}
}
},
"required": ["section_title", "subsections"]
}
}
},
"required": ["document_title", "sections"]
}

Real use case — parsing a research paper:

schema = {
"type": "object",
"properties": {
"paper_title": {"type": "string"},
"abstract": {"type": "string", "maxLength": 500},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"section_name": {"type": "string"},
"paragraphs": {
"type": "array",
"items": {"type": "string", "maxLength": 300},
"max_items": 10
}
},
"required": ["section_name", "paragraphs"],
"additionalProperties": False
},
"max_items": 8
}
},
"required": ["paper_title", "abstract", "sections"]
}

# Test with actual paper excerpt
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{
"role": "user",
"content": "Parse this paper structure and extract: title, abstract, and sections with paragraphs..."
}
],
response_format={
"type": "json_schema",
"json_schema": {"name": "PaperStructure", "schema": schema, "strict": True}
}
)

result = json.loads(response.choices[0].message.content)
print(f"Title: {result['paper_title']}")
for section in result["sections"]:
print(f"Section: {section['section_name']}")
for para in section["paragraphs"]:
print(f" - {para[:50]}...")

Array Size Constraints

Control array length with min_items and max_items. This prevents the LLM from returning unbounded lists.

{
"type": "array",
"items": {...},
"min_items": 1, // At least 1 item
"max_items": 5 // At most 5 items
}

Flattening vs. Nesting Trade-offs

Nested approach (cleaner semantics, more complex schema):

{
"author": {
"name": "Alice",
"email": "[email protected]"
}
}

Flat approach (simpler schema, verbose names):

{
"author_name": "Alice",
"author_email": "[email protected]"
}

For most extraction tasks, prefer flat when feasible. Reserve nesting for:

  • Logically distinct entities (e.g., contact object with multiple fields).
  • Repeating structures (arrays of objects).
  • Multi-level hierarchies (documents with sections).

Testing Nested Schemas

Test edge cases: empty arrays, missing nested fields, deeply nested values:

test_cases = [
"Input with many entities",
"Input with no entities (empty array)",
"Input with minimal data",
"Input with conflicting types"
]

for test in test_cases:
try:
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": test}],
response_format={"type": "json_schema", "json_schema": {"name": "Test", "schema": schema, "strict": True}}
)
result = json.loads(response.choices[0].message.content)
print(f"Test: {test[:30]}... → {len(result.get('entities', []))} entities")
except Exception as e:
print(f"Test: {test[:30]}... → Error: {e}")

Key Takeaways

  • Nested objects use type: object with their own properties and required fields.
  • Arrays use type: array with an items schema describing each element.
  • Arrays of objects are the most common pattern: items: {type: object, properties: {...}}.
  • Use min_items and max_items to constrain array length and prevent unbounded output.
  • Prefer shallow, flattened structures when possible; reserve nesting for logically distinct entities and repeating structures.
  • Test nested schemas with edge cases (empty arrays, deeply nested values, missing optional fields).

Frequently Asked Questions

What's the maximum nesting depth?

Technically unlimited, but practically limit to 2–3 levels. Deeply nested structures (4+ levels) increase error rates and LLM confusion. Flatten where you can.

Can I have optional nested fields?

Yes. A nested object can be excluded from required, but if it's present, its required fields must be satisfied.

{
"properties": {
"contact": {
"type": "object",
"properties": {"email": {"type": "string"}},
"required": ["email"]
}
},
"required": [] // contact is optional
}

How do I handle empty arrays?

Set min_items: 0 (the default). If you always want at least one item, use min_items: 1.

Can arrays contain mixed types?

Yes, with oneOf, but this adds ambiguity. Better to use separate array fields or define a union type that covers all variants.

{
"type": "array",
"items": {
"oneOf": [
{"type": "string"},
{"type": "number"}
]
}
}

Further Reading