Skip to main content

Prompt Management Systems: Version Control

Hardcoding prompts directly into your API is a maintenance nightmare. When you discover that a better preamble reduces hallucinations by 5%, you want to roll out that change safely, run A/B tests, and track which version produced which response. A prompt management system is a database of versioned prompt templates, a deployment mechanism to roll out updates, and integration with your LLM abstraction layer. This article covers building a prompt registry, version control, testing, and rollback strategies.

What is a Prompt Management System?

A prompt management system is a repository and versioning system for prompts, similar to how you version application code. Each prompt template is stored with metadata (model, version, author, performance metrics) and can be deployed to production or held for testing. Instead of hardcoding "You are a helpful assistant." in your code, you store it in the registry as a named template with versions, test it against sample inputs, and deploy the best version to your API.

Prompt Registry Schema

-- Schema for prompt versioning
CREATE TABLE prompt_templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL, -- e.g., "customer_support"
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(organization_id, name)
);

CREATE TABLE prompt_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
template_id UUID NOT NULL REFERENCES prompt_templates(id) ON DELETE CASCADE,
version INT NOT NULL,
author_id UUID NOT NULL REFERENCES users(id),
preamble TEXT NOT NULL, -- System prompt / instructions
template_text TEXT NOT NULL, -- Prompt template with variables
model TEXT NOT NULL,
model_config JSONB, -- {"temperature": 0.7, "max_tokens": 1024}
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
notes TEXT, -- Changelog entry
performance_score FLOAT, -- A/B test score (optional)
UNIQUE(template_id, version)
);

CREATE TABLE prompt_deployments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL,
template_id UUID NOT NULL REFERENCES prompt_templates(id),
version INT NOT NULL,
environment TEXT NOT NULL, -- "staging" or "production"
deployed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deployed_by UUID NOT NULL REFERENCES users(id),
FOREIGN KEY (template_id, version)
REFERENCES prompt_versions(template_id, version)
);

-- Index for fast lookups
CREATE INDEX idx_prompt_versions_template ON prompt_versions(template_id, version DESC);
CREATE INDEX idx_deployments_template ON prompt_deployments(organization_id, template_id);

Creating and Versioning Prompts

# API for managing prompts
from fastapi import FastAPI, Request
from pydantic import BaseModel

app = FastAPI()

class CreatePromptVersionRequest(BaseModel):
preamble: str
template_text: str
model: str
model_config: dict
notes: str

@app.post("/api/v1/prompts/{template_name}/versions")
async def create_prompt_version(
request: Request,
template_name: str,
payload: CreatePromptVersionRequest
):
"""Create a new version of a prompt template."""

org_id = request.state.organization_id
user_id = request.state.user_id

# Get or create template
template = db.query(PromptTemplate).filter(
PromptTemplate.organization_id == org_id,
PromptTemplate.name == template_name
).first()

if not template:
template = PromptTemplate(
organization_id=org_id,
name=template_name
)
db.add(template)
db.flush()

# Get the next version number
latest_version = db.query(
func.max(PromptVersion.version)
).filter(
PromptVersion.template_id == template.id
).scalar() or 0

next_version = latest_version + 1

# Create the new version
version = PromptVersion(
template_id=template.id,
version=next_version,
author_id=user_id,
preamble=payload.preamble,
template_text=payload.template_text,
model=payload.model,
model_config=payload.model_config,
notes=payload.notes
)

db.add(version)
db.commit()

return {
"template": template_name,
"version": next_version,
"created_at": version.created_at
}

@app.get("/api/v1/prompts/{template_name}/versions")
async def list_prompt_versions(request: Request, template_name: str):
"""List all versions of a prompt template."""

org_id = request.state.organization_id

template = db.query(PromptTemplate).filter(
PromptTemplate.organization_id == org_id,
PromptTemplate.name == template_name
).first()

if not template:
return {"versions": []}

versions = db.query(PromptVersion).filter(
PromptVersion.template_id == template.id
).order_by(PromptVersion.version.desc()).all()

return {
"template": template_name,
"versions": [
{
"version": v.version,
"author": v.author.name,
"created_at": v.created_at,
"notes": v.notes,
"performance_score": v.performance_score
}
for v in versions
]
}

Template Variables and Substitution

Prompts often contain user-specific data. Use template variables to parameterize prompts safely.

# Template substitution with variable validation
from string import Template
import re

class PromptTemplate:
def __init__(self, preamble: str, template_text: str):
self.preamble = preamble
self.template_text = template_text

def extract_variables(self) -> set[str]:
"""Extract variable names from template."""
# Variables are marked with ${variable_name}
pattern = r'\${([a-zA-Z_]\w*)}'
matches = re.findall(pattern, self.template_text)
return set(matches)

def render(self, variables: dict) -> str:
"""Substitute variables into template."""

required_vars = self.extract_variables()
missing = required_vars - set(variables.keys())

if missing:
raise ValueError(f"Missing variables: {missing}")

# Sanitize variables to prevent prompt injection
sanitized = {
k: str(v)[:10000] # Truncate to 10K chars max
for k, v in variables.items()
}

template = Template(self.template_text)
return template.substitute(sanitized)

# Usage: render a prompt with customer data
prompt_version = get_deployed_prompt("customer_support", "production")
template = PromptTemplate(prompt_version.preamble, prompt_version.template_text)

filled_prompt = template.render({
"customer_name": "Alice",
"order_id": "12345",
"issue": "Product arrived damaged"
})

# filled_prompt now contains: "Help customer Alice with order 12345..."

A/B Testing Prompts

Compare two versions in production to see which performs better.

# A/B test prompts and measure performance
from enum import Enum

class PromptVariant(str, Enum):
A = "variant_a"
B = "variant_b"

@app.post("/api/v1/prompts/test")
async def test_prompt_with_ab_split(
request: Request,
template_name: str,
user_input: str,
test_id: str
):
"""Generate response and track for A/B testing."""

org_id = request.state.organization_id

# Assign user to variant (consistent by user_id or request)
variant = get_variant_for_user(request.state.user_id, test_id)

# Get the prompt version for this variant
if variant == PromptVariant.A:
prompt_version = get_version(template_name, 1, org_id)
else:
prompt_version = get_version(template_name, 2, org_id)

# Generate response
template = PromptTemplate(
prompt_version.preamble,
prompt_version.template_text
)
filled_prompt = template.render({"input": user_input})

response = await llm_provider.generate(filled_prompt, model=prompt_version.model)

# Log for A/B analysis
test_result = PromptTestResult(
organization_id=org_id,
test_id=test_id,
template_name=template_name,
variant=variant,
version=prompt_version.version,
user_input=user_input,
response=response,
created_at=datetime.utcnow()
)
db.add(test_result)
db.commit()

return {"response": response, "variant": variant}

def get_variant_for_user(user_id: str, test_id: str) -> PromptVariant:
"""Consistently assign user to a variant."""
hash_value = hash(f"{user_id}:{test_id}")
return PromptVariant.A if hash_value % 2 == 0 else PromptVariant.B

Deploying Prompts

# Deploy a prompt version to production
@app.post("/api/v1/prompts/{template_name}/deploy")
async def deploy_prompt_version(
request: Request,
template_name: str,
version: int,
environment: str = "production"
):
"""Deploy a prompt version to an environment."""

org_id = request.state.organization_id
user_id = request.state.user_id

# Check authorization (only admins can deploy)
user = db.query(User).filter(User.id == user_id).first()
if not user or user.role != "admin":
raise HTTPException(status_code=403, detail="Unauthorized")

# Verify the version exists
template = db.query(PromptTemplate).filter(
PromptTemplate.organization_id == org_id,
PromptTemplate.name == template_name
).first()

version_record = db.query(PromptVersion).filter(
PromptVersion.template_id == template.id,
PromptVersion.version == version
).first()

if not version_record:
raise HTTPException(status_code=404, detail="Version not found")

# Create deployment record
deployment = PromptDeployment(
organization_id=org_id,
template_id=template.id,
version=version,
environment=environment,
deployed_by=user_id
)

db.add(deployment)
db.commit()

# Invalidate cache so new requests use the new version
cache.delete(f"prompt:{org_id}:{template_name}:{environment}")

return {
"template": template_name,
"version": version,
"environment": environment,
"deployed_at": deployment.deployed_at
}

def get_deployed_prompt(template_name: str, org_id: str, environment: str = "production"):
"""Fetch the currently deployed prompt version."""

# Check cache first
cache_key = f"prompt:{org_id}:{template_name}:{environment}"
cached = cache.get(cache_key)
if cached:
return cached

# Query the latest deployment for this environment
deployment = db.query(PromptDeployment).join(
PromptVersion
).filter(
PromptDeployment.organization_id == org_id,
PromptDeployment.template_name == template_name,
PromptDeployment.environment == environment
).order_by(
PromptDeployment.deployed_at.desc()
).first()

if not deployment:
raise ValueError(f"No deployment found for {template_name}")

version = deployment.prompt_version
cache.set(cache_key, version, 3600) # Cache for 1 hour
return version

Prompt Management Workflow

StageActionOwnerTools
DraftWrite and test prompt locallyEngineerNotepad, LLM API
VersionSave version to registryEngineerAPI endpoint
TestRun A/B test on stagingQATest framework
ReviewGet feedback from teamTeam leadPR-like review
DeployPush to productionAdminDeployment API
MonitorTrack performance metricsAnalyticsDashboard

Key Takeaways

  • Store prompts in a versioned registry instead of hardcoding them; treat prompts like code.
  • Use template variables (e.g., ${customer_name}) to parameterize prompts and prevent injection.
  • Deploy different versions to staging and production; implement canary deployments to test with real traffic.
  • Run A/B tests to measure which prompt version performs better; track metrics like user satisfaction or task completion.
  • Cache deployed prompts in Redis with a 1-hour TTL to reduce database queries.

Frequently Asked Questions

How do I prevent prompt injection through template variables?

Sanitize all template variables: truncate to max length, strip special characters, and use parameterized templates (not string concatenation). Never concat user input directly into the prompt. Use a template engine that escapes or validates variables.

Should I version prompts in Git or in a database?

Use a database (this approach) for operational flexibility: you can deploy without code changes, run A/B tests, and roll back instantly. Keep Git in sync for audit trails and code reviews, but the source of truth should be the registry.

How do I handle multi-language prompts?

Store a language column in prompt_versions and maintain separate versions per language. Or, store the template in a "language-agnostic" form and use a translation service to render per-language variants at inference time.

What happens if I deploy a bad prompt?

Rollback instantly by deploying the previous version. This is why a registry is valuable: deployment is a one-line API call, not a code deploy. Monitor key metrics (user satisfaction, task success rate) to detect bad prompts quickly.

Further Reading