Skip to main content

Generate Code from Function Schemas

Manually writing schemas that match your code's type signatures is error-prone: a type changes, you forget to update the schema, and the LLM sends the wrong data. Code generation flips this: you define types in your language, and tools automatically emit the JSON Schema. Alternatively, you define the schema once and generate type-safe client code in multiple languages. This article teaches you tools and patterns for bidirectional code generation: types to schemas and schemas to code.

Approach 1: Generate Schema from Type Definitions

Start with your Python/TypeScript/Rust types, then auto-generate schemas. This is the source-of-truth approach: code is always in sync with schemas.

Python: Pydantic to JSON Schema

Pydantic is the standard Python library for schema-based data validation. Define your types with Pydantic; it auto-generates JSON Schema:

from pydantic import BaseModel, Field
from typing import Optional, List

class Location(BaseModel):
latitude: float = Field(..., ge=-90, le=90,
description="Latitude in degrees")
longitude: float = Field(..., ge=-180, le=180,
description="Longitude in degrees")

class SearchRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=100,
description="Search keyword")
limit: Optional[int] = Field(default=10, ge=1, le=100,
description="Max results")
location: Optional[Location] = None

# Generate the JSON Schema
import json
schema = SearchRequest.model_json_schema()
print(json.dumps(schema, indent=2))

Output:

{
"$defs": {
"Location": {
"properties": {
"latitude": {
"description": "Latitude in degrees",
"exclusiveMaximum": 90,
"exclusiveMinimum": -90,
"type": "number"
},
"longitude": {
"description": "Longitude in degrees",
"exclusiveMaximum": 180,
"exclusiveMinimum": -180,
"type": "number"
}
},
"required": ["latitude", "longitude"],
"type": "object"
}
},
"properties": {
"query": {
"description": "Search keyword",
"maxLength": 100,
"minLength": 1,
"type": "string"
},
"limit": {
"default": 10,
"description": "Max results",
"maximum": 100,
"minimum": 1,
"type": "integer"
},
"location": {"$ref": "#/$defs/Location"}
},
"required": ["query"],
"type": "object"
}

Then use this schema to register the tool with the LLM. When the LLM sends data, validate it back:

def handle_search(data: dict):
request = SearchRequest(**data) # Validates against schema
return perform_search(request.query, request.limit, request.location)

Tools: Pydantic is the standard. Also: dataclasses-json, marshmallow, attrs.

TypeScript: Zod or TypeScript to JSON Schema

In TypeScript, use Zod to define types and generate schemas:

import { z } from "zod";

const Location = z.object({
latitude: z.number().min(-90).max(90),
longitude: z.number().min(-180).max(180)
});

const SearchRequest = z.object({
query: z.string().min(1).max(100),
limit: z.number().min(1).max(100).default(10).optional(),
location: Location.optional()
});

// Generate JSON Schema
import { zodToJsonSchema } from "zod-to-json-schema";
const schema = zodToJsonSchema(SearchRequest);

Or use typescript-json-schema to generate from TypeScript interfaces:

typescript-json-schema tsconfig.json SearchRequest > schema.json

Rust: serde_json_schema

In Rust, serde and serde_json_schema crates handle schema generation:

use serde::{Deserialize, Serialize};
use schemars::{JsonSchema, schema_for};

#[derive(Serialize, Deserialize, JsonSchema)]
struct Location {
#[schemars(description = "Latitude in degrees")]
latitude: f64,

#[schemars(description = "Longitude in degrees")]
longitude: f64,
}

#[derive(Serialize, Deserialize, JsonSchema)]
struct SearchRequest {
#[schemars(description = "Search keyword")]
query: String,

#[schemars(default = "default_limit")]
limit: u32,

#[serde(skip_serializing_if = "Option::is_none")]
location: Option<Location>,
}

fn default_limit() -> u32 { 10 }

fn main() {
let schema = schema_for!(SearchRequest);
println!("{}", serde_json::to_string_pretty(&schema).unwrap());
}

Advantages of Type-First Approach:

  • No manual schema writing; types are the single source of truth.
  • Schema always matches code; no sync issues.
  • Validation happens automatically when deserializing.
  • Code is self-documenting.

Approach 2: Generate Code from Schemas

Start with a JSON Schema or OpenAPI spec; generate type-safe code in multiple languages. This is useful when the schema is the contract (e.g., published API).

JSON Schema to Python

Use datamodel-code-generator:

datamodel-code-generator --input schema.json --output models.py

Input schema:

{
"type": "object",
"properties": {
"query": {"type": "string", "minLength": 1},
"limit": {"type": "integer", "minimum": 1, "maximum": 100, "default": 10}
},
"required": ["query"]
}

Generated code:

from pydantic import BaseModel, Field
from typing import Optional

class SearchRequest(BaseModel):
query: str
limit: Optional[int] = Field(default=10, ...)

OpenAPI to TypeScript / Go / Rust

Use openapi-generator:

openapi-generator generate -i api.yaml -g typescript-fetch -o ./generated

This generates client code with proper types, request/response handling, and validation.

Practical Workflow: Schema in the Middle

The most robust approach uses a schema file as the source of truth:

  1. Define schema once (JSON Schema or OpenAPI):
{
"SearchRequest": {
"type": "object",
"properties": { ... },
"required": ["query"]
}
}
  1. Generate code in each language:
# Python
datamodel-code-generator --input schema.json --output python/models.py

# TypeScript
typescript-json-schema schema.ts SearchRequest > ts/schema.json
openapi-generator generate -i schema.yaml -g typescript-fetch -o ./ts

# Rust
schemars-generate schema.json > src/models.rs
  1. Share the schema with the LLM:
import json

with open("schema.json") as f:
schema = json.load(f)

tool_definition = {
"name": "search",
"description": "Search for items",
"parameters": schema["SearchRequest"]
}

# Register with LLM...
  1. Validate LLM responses:
from python.models import SearchRequest

# LLM sends JSON
model_response = {"query": "coffee", "limit": 25}

# Validate against schema
request = SearchRequest(**model_response) # Raises ValidationError if invalid

Integrating Code Generation into CI/CD

Add schema validation to your CI pipeline:

# .github/workflows/schema-validation.yml
name: Schema Validation

on: [push, pull_request]

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install tools
run: |
pip install pydantic datamodel-code-generator jsonschema

- name: Generate schemas from Python types
run: |
datamodel-code-generator --input src/types.py --output generated/schema.json

- name: Validate schema syntax
run: |
jsonschema --validate generated/schema.json https://json-schema.org/draft-2020-12/schema.json

- name: Ensure schemas are in sync
run: |
diff generated/schema.json schema.json || {
echo "Schema out of sync. Run code generation and commit."
exit 1
}

- name: Generate TypeScript types
run: |
npx typescript-json-schema schema.json SearchRequest > ts/search-request.ts

Common Code Generation Tools

LanguageToolInputOutput
PythonPydanticPython typesJSON Schema
Pythondatamodel-code-generatorJSON SchemaPydantic models
TypeScriptzod-to-json-schemaZod schemasJSON Schema
TypeScripttypescript-json-schemaTypeScript interfacesJSON Schema
Gooapi-codegenOpenAPI specGo types + client
RustschemarsRust structsJSON Schema
MultiOpenAPI GeneratorOpenAPI spec20+ languages

Validation After Code Generation

Always validate the generated code compiles and works:

# Test generated models
from generated.models import SearchRequest

# Valid data
request = SearchRequest(query="test", limit=10)
assert request.query == "test"

# Invalid data should fail
try:
SearchRequest(query="") # Empty query
assert False, "Should have raised ValidationError"
except ValueError:
pass # Expected

# LLM response validation
model_data = {"query": "valid", "limit": 25}
request = SearchRequest(**model_data)

Key Takeaways

  • Start with type definitions in your language; use tools like Pydantic to auto-generate JSON Schema. This keeps code and schema in sync.
  • Alternatively, define the schema once and generate type-safe code in multiple languages using OpenAPI Generator or similar.
  • Use schemas as CI/CD gates: fail the build if code and schema diverge.
  • Validate LLM responses against generated models immediately upon receipt.
  • Share generated schemas with LLMs as the tool definition.

Frequently Asked Questions

Which approach is better: types-to-schema or schema-to-code?

Types-to-schema (Pydantic, Zod) is better for most teams: write code once, generate schema automatically. Schema-to-code is better when the schema is a published contract multiple teams depend on.

Can I mix both approaches?

Yes. Generate schema from your code, publish it, and other teams generate their client code from your published schema. This is common in platform teams.

What if the generated schema doesn't match what I want?

Use code generation as a starting point, then hand-edit the schema for enums, patterns, or constraints the tool couldn't infer. Then regenerate code and test thoroughly.

Should I commit generated code?

Yes, for client libraries and external packages (version control, diff visibility). For internal schema definitions, consider not committing generated files if they're regenerated on every build.

How do I handle schema changes during code generation?

Ensure your CI/CD pipeline regenerates code whenever the schema changes. Use Git hooks to enforce this: commit must include updated generated files.

Further Reading