Agent Sandboxing: Secure Tool Execution Boundaries
An agent can call any tool you provide. If you allow it to call delete_database, a model behaving unexpectedly could delete your database. Sandboxing is the practice of isolating tool execution within strict boundaries: limiting filesystem access, capping API calls, restricting network destinations, and auditing every invocation. Proper sandboxing turns a potentially dangerous system into a trustworthy one. Organizations using agent sandboxing report a 95%+ reduction in unintended side effects.
Sandboxing Principles
Principle 1: Least privilege. Give tools only the permissions they need. A tool that fetches user data does not need write permissions.
Principle 2: Isolation. Run each tool in its own process or container so a crash or exploit in one tool does not affect others.
Principle 3: Audit trails. Log every tool call with timestamp, arguments, result, and outcome. This is critical for compliance and debugging.
Principle 4: Rate limiting. Cap how many times the agent can call each tool per minute/hour to prevent resource exhaustion.
Principle 5: Whitelist, not blacklist. Explicitly allow specific actions; forbid everything else by default.
Strategy 1: Permission Wrappers
Wrap tool implementations with permission checks. Before execution, verify the arguments are safe.
from typing import Callable, Dict, Any
import datetime
class PermissionWrapper:
def __init__(self, impl: Callable, permissions: Dict[str, Any]):
self.impl = impl
self.permissions = permissions
self.call_count = 0
self.last_reset = datetime.datetime.now()
def check_rate_limit(self):
"""Check if rate limit is exceeded."""
rate_limit = self.permissions.get("rate_limit_per_minute", 60)
now = datetime.datetime.now()
if (now - self.last_reset).seconds > 60:
self.call_count = 0
self.last_reset = now
if self.call_count >= rate_limit:
raise PermissionError(
f"Rate limit exceeded: {rate_limit} calls per minute"
)
self.call_count += 1
def check_arguments(self, args: Dict[str, Any]):
"""Validate arguments against whitelist."""
allowed_fields = self.permissions.get("allowed_arguments", {})
for key, value in args.items():
if key not in allowed_fields:
raise PermissionError(f"Argument {key} is not allowed")
allowed_values = allowed_fields[key]
if isinstance(allowed_values, list) and value not in allowed_values:
raise PermissionError(
f"Value {value} not in whitelist for {key}"
)
def __call__(self, **kwargs):
"""Execute the tool with permission checks."""
self.check_rate_limit()
self.check_arguments(kwargs)
return self.impl(**kwargs)
# Usage
def delete_user(user_id: int):
"""Delete a user from the database."""
print(f"Deleting user {user_id}...")
return {"status": "deleted"}
# Wrap with strict permissions
wrapped_delete = PermissionWrapper(
delete_user,
permissions={
"rate_limit_per_minute": 1, # Max 1 deletion per minute
"allowed_arguments": {
"user_id": [123, 456, 789] # Only allow deleting specific users
}
}
)
# This succeeds
try:
result = wrapped_delete(user_id=123)
print("✓", result)
except PermissionError as e:
print("✗", e)
# This fails (user_id not whitelisted)
try:
result = wrapped_delete(user_id=999)
except PermissionError as e:
print("✗", e)
Permission wrappers act as a first line of defense. They are cheap to implement and very effective.
Strategy 2: Audit Logging
Every tool call must be logged, with enough detail to reconstruct what happened.
import json
import logging
from datetime import datetime
logging.basicConfig(filename="agent_audit.log", level=logging.INFO)
class AuditedToolWrapper:
def __init__(self, tool_name: str, impl: Callable):
self.tool_name = tool_name
self.impl = impl
def __call__(self, **kwargs):
"""Execute and log the call."""
call_id = datetime.now().isoformat()
# Log the call
log_entry = {
"timestamp": call_id,
"tool": self.tool_name,
"arguments": kwargs,
"status": "started"
}
logging.info(json.dumps(log_entry))
try:
result = self.impl(**kwargs)
# Log success
log_entry["status"] = "success"
log_entry["result"] = result
logging.info(json.dumps(log_entry))
return result
except Exception as e:
# Log failure
log_entry["status"] = "error"
log_entry["error"] = str(e)
logging.error(json.dumps(log_entry))
raise
# Usage
audited_delete = AuditedToolWrapper("delete_user", delete_user)
audited_delete(user_id=123)
# Log output:
# {"timestamp": "2026-06-02T10:30:45.123456", "tool": "delete_user", "arguments": {"user_id": 123}, "status": "started"}
# {"timestamp": "2026-06-02T10:30:45.234567", "tool": "delete_user", "arguments": {"user_id": 123}, "status": "success", "result": {...}}
Audit logs are invaluable for compliance, debugging, and security incident response. Treat them as immutable records.
Strategy 3: Containerized Execution
For untrusted tools or high-risk operations, run them in isolated containers or VMs.
import subprocess
import json
def execute_tool_in_container(tool_name: str, arguments: Dict[str, Any], docker_image: str = "agent-tools:latest"):
"""Execute a tool in a Docker container."""
# Prepare the tool call as JSON
payload = {
"tool": tool_name,
"arguments": arguments
}
# Run in container with strict limits
result = subprocess.run(
[
"docker", "run",
"--rm",
"--memory=512m", # 512 MB RAM limit
"--cpus=1", # 1 CPU limit
"--read-only", # Filesystem is read-only
"--network=none", # No network access
docker_image,
json.dumps(payload)
],
capture_output=True,
text=True,
timeout=30 # 30 second timeout
)
if result.returncode != 0:
raise Exception(f"Tool failed: {result.stderr}")
return json.loads(result.stdout)
# Usage
result = execute_tool_in_container(
"fetch_data",
{"query": "SELECT * FROM users"},
docker_image="agent-tools:v1"
)
Containerized execution is heavyweight but provides strong isolation. Use it for high-risk tools or untrusted tool code.
Strategy 4: Network Sandboxing
Restrict which URLs/IPs tools can reach.
import requests
from urllib.parse import urlparse
ALLOWED_DOMAINS = [
"api.example.com",
"data.example.com",
"public-api.github.com"
]
def fetch_url_safely(url: str):
"""Fetch a URL, allowing only whitelisted domains."""
parsed = urlparse(url)
if parsed.netloc not in ALLOWED_DOMAINS:
raise PermissionError(f"Domain {parsed.netloc} is not whitelisted")
# Fetch with timeout
response = requests.get(url, timeout=10)
return response.text
# This succeeds
try:
result = fetch_url_safely("https://api.example.com/data")
print("✓", result)
except PermissionError as e:
print("✗", e)
# This fails (domain not whitelisted)
try:
result = fetch_url_safely("https://attacker.com/malware")
except PermissionError as e:
print("✗", e)
Network sandboxing prevents tools from exfiltrating data or accessing internal services.
Strategy 5: Timeout and Resource Limits
Prevent tools from consuming infinite resources.
import signal
class TimeoutError(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutError("Tool execution exceeded timeout")
def execute_with_timeout(func, timeout_seconds=10, **kwargs):
"""Execute a function with a timeout."""
# Set signal handler (Unix/Linux only)
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(timeout_seconds)
try:
result = func(**kwargs)
signal.alarm(0) # Cancel alarm
return result
except TimeoutError:
raise TimeoutError(f"Tool execution exceeded {timeout_seconds}s limit")
# Usage
def slow_operation(n):
import time
time.sleep(n)
return {"status": "done"}
try:
result = execute_with_timeout(slow_operation, timeout_seconds=5, n=10)
except TimeoutError as e:
print("✗", e)
Timeouts prevent runaway tools from blocking the agent.
Comprehensive Sandbox Example
class SandboxedAgent:
def __init__(self, tools_config):
self.tools = {}
self.audit_log = []
for tool_name, cfg in tools_config.items():
# Wrap with permissions
wrapped = PermissionWrapper(cfg["impl"], cfg["permissions"])
# Wrap with audit logging
audited = AuditedToolWrapper(tool_name, wrapped)
self.tools[tool_name] = audited
def call_tool(self, tool_name: str, arguments: Dict[str, Any]):
"""Call a tool within sandbox."""
if tool_name not in self.tools:
raise ValueError(f"Unknown tool: {tool_name}")
try:
result = self.tools[tool_name](**arguments)
return {"status": "success", "data": result}
except PermissionError as e:
return {"status": "permission_denied", "error": str(e)}
except Exception as e:
return {"status": "error", "error": str(e)}
# Usage
tools_config = {
"delete_user": {
"impl": delete_user,
"permissions": {
"rate_limit_per_minute": 1,
"allowed_arguments": {"user_id": [123, 456]}
}
}
}
sandbox = SandboxedAgent(tools_config)
result = sandbox.call_tool("delete_user", {"user_id": 123})
Key Takeaways
- Apply least-privilege: tools only get permissions they need.
- Audit every tool call with timestamp, arguments, and result.
- Use permission wrappers to validate arguments before execution.
- Implement rate limits to prevent resource exhaustion.
- For high-risk tools, use containerized or network-sandboxed execution.
Frequently Asked Questions
Should I sandbox all tools or just risky ones?
Sandbox all tools. The risk is not just from malicious models, but from unexpected behaviors. A tool that reads from a database could accidentally read the entire table; sandboxing limits the damage.
How do I balance security and usability?
Start permissive (allow most actions) and tighten based on incidents. If a tool is abused, restrict it. Monitor audit logs for anomalies and adjust permissions accordingly.
Can sandboxing slow down tools?
Yes, slightly (5–15% overhead for permission checks and logging). For most agents, this is acceptable. For performance-critical tools, use lighter-weight sandboxing (audit logging only, no containerization).
What permissions should I set by default?
Start with: rate_limit=10/minute, read-only access, no network access to internal IPs, 10s timeout. Expand based on tool needs.
How do I handle a tool that needs to read files?
Use a whitelist: "allowed_paths": ["/data/public/*", "/logs/*"]. Forbid access to sensitive paths like /etc, /var/secrets, etc.