Skip to main content

Building a Prompt Registry: Store and Access Prompts

A prompt registry is a centralized repository—backed by a database or file system—that stores all prompt versions, metadata (creator, changelog, status), and enables fast, auditable retrieval. Instead of hardcoding prompts in your application or storing them ad-hoc in Slack/Notion, a registry ensures every prompt is versioned, immutable, queryable, and tied to the environment it's deployed in.

A well-designed registry is the backbone of LLMOps. It enables teams to iterate on prompts independently of application deployments, roll out changes safely, and audit every change. Without a registry, you cannot run A/B tests, canary rollouts, or fast rollbacks.

Registry Architecture: Three Patterns

Pattern 1: Git-backed (simple, small teams) — Prompts live in a Git repo as YAML or markdown files. Versions are Git commits or tags. Metadata is in YAML frontmatter or a central registry.yaml.

Pros: Free, version control built-in, trivial to code-review, offline support. Cons: Latency (must pull/parse), no role-based access, not suitable for high-frequency updates.

Pattern 2: Database with REST API (medium teams) — PostgreSQL, MongoDB, or DynamoDB stores prompts and metadata. A service exposes a REST API. Clients fetch prompts over HTTP.

Pros: Fast, role-based access control, queryable metadata, support for concurrent updates, suitable for canary rollouts (update the DB; no app redeploy). Cons: Requires operations (run a service, monitor it), more infrastructure.

Pattern 3: Third-party platform (Anthropic Workbench, LangSmith, Weights & Biases) — Outsource to a managed service that provides a UI, versioning, and API.

Pros: No ops burden, rich UI, experiment tracking, integration with evaluation. Cons: Vendor lock-in, data residency concerns, pricing.

Most production teams start with Pattern 1 and graduate to Pattern 2 as they grow.

Database Schema (Pattern 2)

Design a schema to support versioning and metadata:

CREATE TABLE prompts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL, -- e.g., "customer-support"
version TEXT NOT NULL, -- e.g., "2.1.0"
created_by TEXT NOT NULL, -- creator email
created_at TIMESTAMP DEFAULT NOW(),
updated_by TEXT, -- last modifier
updated_at TIMESTAMP,
description TEXT, -- human-readable changelog
system_prompt TEXT NOT NULL, -- the actual prompt text
token_budget INT DEFAULT 4096,
temperature FLOAT DEFAULT 0.7,
status TEXT DEFAULT 'draft', -- draft|staging|production
environment TEXT, -- dev|staging|prod
UNIQUE(name, version),
CHECK (status IN ('draft', 'staging', 'production'))
);

CREATE TABLE prompt_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id) ON DELETE CASCADE,
changed_by TEXT NOT NULL,
changed_at TIMESTAMP DEFAULT NOW(),
change_type TEXT NOT NULL, -- created|promoted|reverted
old_value TEXT,
new_value TEXT,
reason TEXT -- why the change
);

CREATE TABLE prompt_tests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id),
test_name TEXT, -- e.g., "customer-satisfaction-v1"
passed INT, -- number of tests passed
failed INT,
avg_score FLOAT, -- e.g., average rating
tested_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_prompts_name_version ON prompts(name, version DESC);
CREATE INDEX idx_prompts_status ON prompts(status);
CREATE INDEX idx_prompts_environment ON prompts(environment);

REST API Interface

Expose a simple HTTP API:

GET /api/prompts/{name}?version={version}
Returns a single prompt with metadata.

GET /api/prompts/{name}/latest?environment={env}
Returns the latest prompt for an environment (e.g., production).

GET /api/prompts/{name}/versions
Lists all versions of a prompt.

POST /api/prompts
Create a new prompt version.
Body: { "name", "system_prompt", "version", "description" }

PATCH /api/prompts/{id}/promote
Promote a prompt from staging to production.
Body: { "environment": "production", "reason": "passed QA" }

GET /api/prompts/{name}/history
Audit trail of all changes to a prompt.

Example client code in Python:

import requests
from typing import Optional

class PromptRegistry:
def __init__(self, base_url: str = "http://localhost:8000"):
self.base_url = base_url

def fetch(self, name: str, version: Optional[str] = None,
environment: Optional[str] = None) -> dict:
"""Fetch a prompt. If version is None, fetch latest for environment."""
if version:
url = f"{self.base_url}/api/prompts/{name}?version={version}"
else:
url = f"{self.base_url}/api/prompts/{name}/latest?environment={environment or 'production'}"

resp = requests.get(url)
resp.raise_for_status()
return resp.json()

def create(self, name: str, system_prompt: str, version: str,
description: str) -> dict:
"""Create a new prompt version."""
resp = requests.post(
f"{self.base_url}/api/prompts",
json={
"name": name,
"system_prompt": system_prompt,
"version": version,
"description": description
}
)
resp.raise_for_status()
return resp.json()

def promote(self, prompt_id: str, environment: str, reason: str) -> dict:
"""Promote a prompt to a new environment."""
resp = requests.patch(
f"{self.base_url}/api/prompts/{prompt_id}/promote",
json={"environment": environment, "reason": reason}
)
resp.raise_for_status()
return resp.json()

# Usage
registry = PromptRegistry()
prompt_data = registry.fetch("customer-support", environment="production")
system_prompt = prompt_data["system_prompt"]

Caching Strategy

Fetching prompts on every inference request is slow. Implement a two-level cache:

  1. In-memory cache (seconds): Cache at the application level. Check in-memory, fall back to registry API.
  2. Refresh interval: Periodically (every 5–10 minutes) refresh cached prompts to pick up changes.
from functools import lru_cache
from datetime import datetime, timedelta
import asyncio

class CachedPromptRegistry:
def __init__(self, registry_url: str, cache_ttl_seconds: int = 300):
self.registry_url = registry_url
self.cache_ttl = cache_ttl_seconds
self._cache = {}
self._cache_time = {}

async def fetch(self, name: str, environment: str = "production") -> str:
cache_key = f"{name}:{environment}"
now = datetime.now()

# Return cached if fresh
if cache_key in self._cache:
cached_time = self._cache_time[cache_key]
if now - cached_time < timedelta(seconds=self.cache_ttl):
return self._cache[cache_key]

# Fetch from registry
async with aiohttp.ClientSession() as session:
async with session.get(
f"{self.registry_url}/api/prompts/{name}/latest",
params={"environment": environment}
) as resp:
data = await resp.json()
prompt_text = data["system_prompt"]
self._cache[cache_key] = prompt_text
self._cache_time[cache_key] = now
return prompt_text

Organizing Prompts by Domain

For multi-domain applications (e.g., customer support + content moderation), organize the registry hierarchically:

prompts/
├── customer-support/
│ ├── v1.0.0/
│ │ ├── system_prompt.txt
│ │ ├── metadata.json
│ │ └── examples.json
│ └── v1.1.0/
├── content-moderator/
│ └── v2.0.0/
└── summarizer/

Or in the database, add a domain/category column:

ALTER TABLE prompts ADD COLUMN domain TEXT DEFAULT 'general';
-- E.g., domain='customer-support', 'content-moderation', 'code-gen'

Audit and Compliance

Log every access to the registry:

def fetch_prompt(registry: PromptRegistry, name: str, version: str, user: str):
result = registry.fetch(name, version)

# Log the access
audit_log.info({
"action": "fetch",
"prompt": name,
"version": version,
"user": user,
"timestamp": datetime.now().isoformat(),
"ip": request.remote_addr
})

return result

Key Takeaways

  • A prompt registry centralizes storage, versioning, and metadata, enabling safe iteration and auditing.
  • Start with Git-backed prompts for small teams; graduate to a database + REST API as you grow.
  • Database schema includes prompts, history, and test results; enables queries by name, version, status, and environment.
  • Implement caching (in-memory with TTL) to reduce latency on inference.
  • Audit all access for compliance and troubleshooting.

Frequently Asked Questions

Can I version multiple prompt files (system + context examples) together?

Yes. Store them as separate rows in the database, linked by release_id. When fetching, return all components for a release. This is cleaner than storing everything in a single system_prompt column.

How do I handle prompts with dynamic sections (user-specific instructions)?

Store the template (with placeholders like {user_role} or {company_name}) in the registry, and substitute at runtime. Version the template, not the rendered prompt. This keeps the registry clean and auditable.

Should I allow direct edits to production prompts?

No. Enforce a promotion workflow: draft → test → staging → production. Use database triggers or application logic to prevent direct updates to production versions.

How long should I keep old versions?

Indefinitely. Storage is cheap. Keep a complete history for auditing. Archive versions older than 2 years to a cold storage system (S3 Glacier) if needed.

What if multiple teams manage prompts?

Use role-based access control (RBAC). Create roles (e.g., "customer-support team", "data science team") and assign permissions: who can create, promote, and revert prompts. Log all changes with the user.

Further Reading