Skip to main content

Multi-Tenant Semantic Caching and Data Isolation

Semantic caching in multi-tenant systems (SaaS platforms serving multiple organizations or users) introduces a critical security risk: serving a cached response intended for Organization A to a user in Organization B. Unlike traditional databases with row-level access control, semantic caches match by similarity, not by identity. A query from Tenant B that semantically matches a cached query from Tenant A could inadvertently leak sensitive data.

This article covers how to design and enforce strict multi-tenant cache isolation: namespacing cache entries by tenant, scoping searches to the current tenant's data, implementing audit logging, and testing isolation boundaries. By the end, you will know how to deploy a semantic cache in a SaaS system with the same data-protection guarantees as a relational database.

The Multi-Tenant Cache Security Risk

Consider a medical SaaS platform caching responses to patient queries. User Alice (Organization Acme Health) asks, "What medications does my patient show allergies to?" The LLM analyzes Alice's tenant-specific medical database and returns patient-sensitive data. This response is cached.

Later, User Bob (Organization Beta Clinic, different tenant) asks a semantically similar question: "Which medications might my patient be sensitive to?" Without tenant isolation, the cache serves Alice's cached response—which contains Acme Health's patient data—to Bob. This is a data breach.

The solution: tenant-scoped caching. Every cache entry is bound to exactly one tenant (organization, workspace, or user), and search operations filter to only that tenant's cache.

Tenant Isolation Architecture

A production multi-tenant semantic cache has three layers:

  1. Tenant context: Extract the current tenant (organization ID, user ID) from the request.
  2. Tenant-scoped storage: Tag every cache entry with a tenant identifier; filter searches by tenant.
  3. Isolation enforcement: Prevent queries and administrative operations across tenant boundaries.

Example: Tenant-scoped cache implementation

from typing import NamedTuple
from dataclasses import dataclass
import hashlib

@dataclass
class TenantContext:
"""Information about the current user/tenant making a request."""
org_id: str # Organization identifier (required)
user_id: str # User within the org (required; may be "system" for shared queries)
request_id: str # For audit logging
timestamp: str # Request timestamp


class MultiTenantSemanticCache:
"""Semantic cache enforcing strict tenant isolation."""

def __init__(self, threshold: float = 0.95):
self.threshold = threshold
# Cache structure: tenant_id -> list of (embedding, response, metadata)
self.cache_by_tenant = {}
self.access_log = [] # Audit log of all cache operations

def _tenant_key(self, tenant_context: TenantContext) -> str:
"""Generate a unique, deterministic tenant key (e.g., org_id + user_id)."""
# Include both org and user for fine-grained isolation
return f"{tenant_context.org_id}#{tenant_context.user_id}"

def _ensure_tenant_bucket(self, tenant_key: str):
"""Lazily create cache bucket for a new tenant."""
if tenant_key not in self.cache_by_tenant:
self.cache_by_tenant[tenant_key] = []

def store(self, query: str, embedding, response: str,
tenant_context: TenantContext):
"""
Store a cache entry for a specific tenant.
All future searches for this entry are scoped to the same tenant.
"""
tenant_key = self._tenant_key(tenant_context)
self._ensure_tenant_bucket(tenant_key)

metadata = {
"query": query,
"tenant_key": tenant_key,
"user_id": tenant_context.user_id,
"created_at": tenant_context.timestamp,
"request_id": tenant_context.request_id
}

self.cache_by_tenant[tenant_key].append((embedding, response, metadata))

# Audit log
self._log_access("STORE", tenant_context, success=True, details=f"Cached query (len={len(query)})")

def find_similar(self, query_embedding, tenant_context: TenantContext):
"""
Search for similar responses within the current tenant's cache only.
Returns: (response, similarity) or None if no match above threshold.
"""
tenant_key = self._tenant_key(tenant_context)

# Ensure we only search this tenant's cache
if tenant_key not in self.cache_by_tenant:
self._log_access("SEARCH", tenant_context, success=True, details="Cache miss (no tenant bucket)")
return None

tenant_cache = self.cache_by_tenant[tenant_key]
best_match = None
best_similarity = self.threshold

for cached_embedding, cached_response, metadata in tenant_cache:
similarity = np.dot(query_embedding, cached_embedding)
if similarity > best_similarity:
best_similarity = similarity
best_match = (cached_response, similarity)

if best_match:
self._log_access("SEARCH", tenant_context, success=True,
details=f"Cache hit (similarity={best_similarity:.4f})")
else:
self._log_access("SEARCH", tenant_context, success=True, details="Cache miss")

return best_match

def _log_access(self, operation: str, tenant_context: TenantContext,
success: bool, details: str):
"""Log all cache operations for audit and compliance."""
log_entry = {
"timestamp": tenant_context.timestamp,
"operation": operation, # STORE, SEARCH, DELETE, ADMIN
"org_id": tenant_context.org_id,
"user_id": tenant_context.user_id,
"request_id": tenant_context.request_id,
"success": success,
"details": details
}
self.access_log.append(log_entry)

def get_tenant_stats(self, tenant_context: TenantContext) -> dict:
"""Get cache statistics for a specific tenant only."""
tenant_key = self._tenant_key(tenant_context)
tenant_cache = self.cache_by_tenant.get(tenant_key, [])

return {
"tenant_key": tenant_key,
"cached_entries": len(tenant_cache),
"storage_bytes": sum(
len(r.encode('utf-8')) for _, r, _ in tenant_cache
),
}

def export_audit_log(self, org_id: str) -> list[dict]:
"""
Compliance export: return all audit logs for an organization.
Useful for SOC2, HIPAA, GDPR audits.
"""
return [
log for log in self.access_log
if log["org_id"] == org_id
]

Enforcing Isolation at Request Entry

Every LLM request must carry tenant context. Extract it early, validate it, and enforce it throughout:

from fastapi import FastAPI, HTTPException, Request
from functools import wraps

app = FastAPI()
cache = MultiTenantSemanticCache()

def extract_tenant_context(request: Request) -> TenantContext:
"""
Extract and validate tenant context from request headers/JWT.
Raises HTTPException if tenant context is missing or invalid.
"""
# Assume JWT or session contains these claims
auth_header = request.headers.get("Authorization", "")
if not auth_header.startswith("Bearer "):
raise HTTPException(status_code=401, detail="Missing authorization")

# In production, decode JWT and extract claims
# For this example, we'll mock it:
org_id = request.headers.get("X-Org-ID")
user_id = request.headers.get("X-User-ID")

if not org_id or not user_id:
raise HTTPException(status_code=400, detail="Missing tenant context")

return TenantContext(
org_id=org_id,
user_id=user_id,
request_id=request.headers.get("X-Request-ID", str(uuid.uuid4())),
timestamp=datetime.utcnow().isoformat()
)


@app.post("/query")
async def query_with_cache(request: Request, query_text: str):
"""
Main endpoint: query with semantic caching and tenant isolation.
"""
# Step 1: Extract and validate tenant
tenant_context = extract_tenant_context(request)

# Step 2: Embed query
query_embedding = embed_text(query_text)

# Step 3: Search cache (scoped to tenant)
match = cache.find_similar(query_embedding, tenant_context)
if match:
cached_response, similarity = match
return {"response": cached_response, "cached": True, "similarity": similarity}

# Step 4: Cache miss — compute via LLM (also tenant-scoped)
llm_response = call_llm_for_tenant(query_text, tenant_context)

# Step 5: Store in cache (tenant-scoped)
cache.store(query_text, query_embedding, llm_response, tenant_context)

return {"response": llm_response, "cached": False}


@app.get("/audit/logs")
async def get_audit_logs(request: Request):
"""Export audit logs for compliance (org-level only)."""
tenant_context = extract_tenant_context(request)
logs = cache.export_audit_log(tenant_context.org_id)
return {"audit_logs": logs, "count": len(logs)}

Testing Isolation Boundaries

Unit tests must verify that isolation is enforced and that no cross-tenant leaks are possible.

Example: Isolation unit tests

def test_cache_isolation_prevents_cross_tenant_access():
"""Verify that Tenant A cannot access Tenant B's cached responses."""
cache = MultiTenantSemanticCache(threshold=0.95)

# Tenant A caches a response
tenant_a = TenantContext(org_id="org_a", user_id="user_1",
request_id="req1", timestamp="2026-06-02T00:00:00")
query_a = "My secret project details are..."
emb_a = embed_text(query_a)
cache.store(query_a, emb_a, "SENSITIVE DATA: Acme Project", tenant_a)

# Tenant B searches with a similar query
tenant_b = TenantContext(org_id="org_b", user_id="user_2",
request_id="req2", timestamp="2026-06-02T00:00:01")
query_b = "My secret project..." # Paraphrase of query_a
emb_b = embed_text(query_b)

# Search should return None (no match in tenant_b's bucket)
match = cache.find_similar(emb_b, tenant_b)
assert match is None, "Tenant B leaked access to Tenant A's cache!"


def test_cache_isolation_user_level():
"""Verify user-level isolation within the same org."""
cache = MultiTenantSemanticCache(threshold=0.95)

# User 1 in Org A caches data
user_1 = TenantContext(org_id="org_a", user_id="user_1",
request_id="req1", timestamp="2026-06-02T00:00:00")
cache.store("My personal info: SSN 123-45-6789",
embed_text("My personal info"), "USER_1_DATA", user_1)

# User 2 in same Org A tries to access
user_2 = TenantContext(org_id="org_a", user_id="user_2",
request_id="req2", timestamp="2026-06-02T00:00:01")
match = cache.find_similar(embed_text("My personal info"), user_2)
assert match is None, "User 2 leaked access to User 1's personal cache!"


def test_audit_log_completeness():
"""Verify all cache operations are logged."""
cache = MultiTenantSemanticCache()
tenant = TenantContext(org_id="org_x", user_id="user_x",
request_id="req_x", timestamp="2026-06-02T00:00:00")

# Perform operations
cache.store("Q1", embed_text("Q1"), "R1", tenant)
cache.find_similar(embed_text("Q1"), tenant)

# Audit log should have entries
logs = cache.export_audit_log("org_x")
assert len(logs) >= 2, f"Expected >=2 log entries, got {len(logs)}"
assert logs[0]["operation"] == "STORE"
assert logs[1]["operation"] == "SEARCH"

Key Takeaways

  • Multi-tenant caching requires strict namespacing: tag every entry with org_id and user_id, and filter all searches to the current tenant only.
  • Extract tenant context from requests early (JWT, headers) and validate it; pass it through all cache operations.
  • Audit all cache operations (STORE, SEARCH, DELETE) for compliance (SOC2, HIPAA, GDPR); export logs by organization.
  • Test isolation boundaries with unit tests that verify cross-tenant leaks are impossible.
  • A single isolation bug can leak sensitive data to competitors or bad actors; prioritize isolation over performance.

Frequently Asked Questions

Can I share a cache entry across multiple tenants?

Only if the cached response is explicitly marked "public" and the query is non-sensitive. Example: "What is the capital of France?" is the same for all tenants, so you could store it once and serve it across orgs. Requires a separate cache tier and explicit opt-in; most responses are tenant-specific.

What happens if a tenant's user context is compromised?

Attacker can query the victim tenant's cache, but cannot access other tenants. Implement rate limiting, access-pattern anomaly detection, and re-authentication for sensitive queries. This is equivalent to a compromised database user; standard database mitigations apply.

How do I handle shared queries across a team within an organization?

Use a shared user ID (e.g., user_id="team_shared") for team-level cached queries. Individual user IDs for personal queries. Requires careful access-control logic, but allows sharing within organizations while preventing cross-org leaks.

What are the compliance implications?

Semantic caches are data storage and must meet GDPR (deletion), HIPAA (encryption), SOC2 (audit logs), and similar requirements. The audit log is essential for compliance; include it in your data-retention policy and export procedures.

Can I migrate a single-tenant cache to multi-tenant?

Yes, but carefully. Add a migration tool that re-tags all existing entries with a default org_id. Test isolation before going live. A missed entry could leak data.

Further Reading