Building a Prompt Registry: Store and Access Prompts
A prompt registry is a centralized repository—backed by a database or file system—that stores all prompt versions, metadata (creator, changelog, status), and enables fast, auditable retrieval. Instead of hardcoding prompts in your application or storing them ad-hoc in Slack/Notion, a registry ensures every prompt is versioned, immutable, queryable, and tied to the environment it's deployed in.
A well-designed registry is the backbone of LLMOps. It enables teams to iterate on prompts independently of application deployments, roll out changes safely, and audit every change. Without a registry, you cannot run A/B tests, canary rollouts, or fast rollbacks.
Registry Architecture: Three Patterns
Pattern 1: Git-backed (simple, small teams) — Prompts live in a Git repo as YAML or markdown files. Versions are Git commits or tags. Metadata is in YAML frontmatter or a central registry.yaml.
Pros: Free, version control built-in, trivial to code-review, offline support. Cons: Latency (must pull/parse), no role-based access, not suitable for high-frequency updates.
Pattern 2: Database with REST API (medium teams) — PostgreSQL, MongoDB, or DynamoDB stores prompts and metadata. A service exposes a REST API. Clients fetch prompts over HTTP.
Pros: Fast, role-based access control, queryable metadata, support for concurrent updates, suitable for canary rollouts (update the DB; no app redeploy). Cons: Requires operations (run a service, monitor it), more infrastructure.
Pattern 3: Third-party platform (Anthropic Workbench, LangSmith, Weights & Biases) — Outsource to a managed service that provides a UI, versioning, and API.
Pros: No ops burden, rich UI, experiment tracking, integration with evaluation. Cons: Vendor lock-in, data residency concerns, pricing.
Most production teams start with Pattern 1 and graduate to Pattern 2 as they grow.
Database Schema (Pattern 2)
Design a schema to support versioning and metadata:
CREATE TABLE prompts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL, -- e.g., "customer-support"
version TEXT NOT NULL, -- e.g., "2.1.0"
created_by TEXT NOT NULL, -- creator email
created_at TIMESTAMP DEFAULT NOW(),
updated_by TEXT, -- last modifier
updated_at TIMESTAMP,
description TEXT, -- human-readable changelog
system_prompt TEXT NOT NULL, -- the actual prompt text
token_budget INT DEFAULT 4096,
temperature FLOAT DEFAULT 0.7,
status TEXT DEFAULT 'draft', -- draft|staging|production
environment TEXT, -- dev|staging|prod
UNIQUE(name, version),
CHECK (status IN ('draft', 'staging', 'production'))
);
CREATE TABLE prompt_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id) ON DELETE CASCADE,
changed_by TEXT NOT NULL,
changed_at TIMESTAMP DEFAULT NOW(),
change_type TEXT NOT NULL, -- created|promoted|reverted
old_value TEXT,
new_value TEXT,
reason TEXT -- why the change
);
CREATE TABLE prompt_tests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id),
test_name TEXT, -- e.g., "customer-satisfaction-v1"
passed INT, -- number of tests passed
failed INT,
avg_score FLOAT, -- e.g., average rating
tested_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_prompts_name_version ON prompts(name, version DESC);
CREATE INDEX idx_prompts_status ON prompts(status);
CREATE INDEX idx_prompts_environment ON prompts(environment);
REST API Interface
Expose a simple HTTP API:
GET /api/prompts/{name}?version={version}
Returns a single prompt with metadata.
GET /api/prompts/{name}/latest?environment={env}
Returns the latest prompt for an environment (e.g., production).
GET /api/prompts/{name}/versions
Lists all versions of a prompt.
POST /api/prompts
Create a new prompt version.
Body: { "name", "system_prompt", "version", "description" }
PATCH /api/prompts/{id}/promote
Promote a prompt from staging to production.
Body: { "environment": "production", "reason": "passed QA" }
GET /api/prompts/{name}/history
Audit trail of all changes to a prompt.
Example client code in Python:
import requests
from typing import Optional
class PromptRegistry:
def __init__(self, base_url: str = "http://localhost:8000"):
self.base_url = base_url
def fetch(self, name: str, version: Optional[str] = None,
environment: Optional[str] = None) -> dict:
"""Fetch a prompt. If version is None, fetch latest for environment."""
if version:
url = f"{self.base_url}/api/prompts/{name}?version={version}"
else:
url = f"{self.base_url}/api/prompts/{name}/latest?environment={environment or 'production'}"
resp = requests.get(url)
resp.raise_for_status()
return resp.json()
def create(self, name: str, system_prompt: str, version: str,
description: str) -> dict:
"""Create a new prompt version."""
resp = requests.post(
f"{self.base_url}/api/prompts",
json={
"name": name,
"system_prompt": system_prompt,
"version": version,
"description": description
}
)
resp.raise_for_status()
return resp.json()
def promote(self, prompt_id: str, environment: str, reason: str) -> dict:
"""Promote a prompt to a new environment."""
resp = requests.patch(
f"{self.base_url}/api/prompts/{prompt_id}/promote",
json={"environment": environment, "reason": reason}
)
resp.raise_for_status()
return resp.json()
# Usage
registry = PromptRegistry()
prompt_data = registry.fetch("customer-support", environment="production")
system_prompt = prompt_data["system_prompt"]
Caching Strategy
Fetching prompts on every inference request is slow. Implement a two-level cache:
- In-memory cache (seconds): Cache at the application level. Check in-memory, fall back to registry API.
- Refresh interval: Periodically (every 5–10 minutes) refresh cached prompts to pick up changes.
from functools import lru_cache
from datetime import datetime, timedelta
import asyncio
class CachedPromptRegistry:
def __init__(self, registry_url: str, cache_ttl_seconds: int = 300):
self.registry_url = registry_url
self.cache_ttl = cache_ttl_seconds
self._cache = {}
self._cache_time = {}
async def fetch(self, name: str, environment: str = "production") -> str:
cache_key = f"{name}:{environment}"
now = datetime.now()
# Return cached if fresh
if cache_key in self._cache:
cached_time = self._cache_time[cache_key]
if now - cached_time < timedelta(seconds=self.cache_ttl):
return self._cache[cache_key]
# Fetch from registry
async with aiohttp.ClientSession() as session:
async with session.get(
f"{self.registry_url}/api/prompts/{name}/latest",
params={"environment": environment}
) as resp:
data = await resp.json()
prompt_text = data["system_prompt"]
self._cache[cache_key] = prompt_text
self._cache_time[cache_key] = now
return prompt_text
Organizing Prompts by Domain
For multi-domain applications (e.g., customer support + content moderation), organize the registry hierarchically:
prompts/
├── customer-support/
│ ├── v1.0.0/
│ │ ├── system_prompt.txt
│ │ ├── metadata.json
│ │ └── examples.json
│ └── v1.1.0/
├── content-moderator/
│ └── v2.0.0/
└── summarizer/
Or in the database, add a domain/category column:
ALTER TABLE prompts ADD COLUMN domain TEXT DEFAULT 'general';
-- E.g., domain='customer-support', 'content-moderation', 'code-gen'
Audit and Compliance
Log every access to the registry:
def fetch_prompt(registry: PromptRegistry, name: str, version: str, user: str):
result = registry.fetch(name, version)
# Log the access
audit_log.info({
"action": "fetch",
"prompt": name,
"version": version,
"user": user,
"timestamp": datetime.now().isoformat(),
"ip": request.remote_addr
})
return result
Key Takeaways
- A prompt registry centralizes storage, versioning, and metadata, enabling safe iteration and auditing.
- Start with Git-backed prompts for small teams; graduate to a database + REST API as you grow.
- Database schema includes prompts, history, and test results; enables queries by name, version, status, and environment.
- Implement caching (in-memory with TTL) to reduce latency on inference.
- Audit all access for compliance and troubleshooting.
Frequently Asked Questions
Can I version multiple prompt files (system + context examples) together?
Yes. Store them as separate rows in the database, linked by release_id. When fetching, return all components for a release. This is cleaner than storing everything in a single system_prompt column.
How do I handle prompts with dynamic sections (user-specific instructions)?
Store the template (with placeholders like {user_role} or {company_name}) in the registry, and substitute at runtime. Version the template, not the rendered prompt. This keeps the registry clean and auditable.
Should I allow direct edits to production prompts?
No. Enforce a promotion workflow: draft → test → staging → production. Use database triggers or application logic to prevent direct updates to production versions.
How long should I keep old versions?
Indefinitely. Storage is cheap. Keep a complete history for auditing. Archive versions older than 2 years to a cold storage system (S3 Glacier) if needed.
What if multiple teams manage prompts?
Use role-based access control (RBAC). Create roles (e.g., "customer-support team", "data science team") and assign permissions: who can create, promote, and revert prompts. Log all changes with the user.
Further Reading
- Prompt Management Best Practices — Anthropic's guidance on organizing and versioning prompts.
- LangSmith: A Prompt Registry and Evaluation Framework — A production-ready platform for prompt management.
- Database Design for Audit Trails — Designing schemas to support auditing and compliance.
- Cache Invalidation Strategies — How to invalidate caches safely.