← All concepts

Sandboxing

Limit what the tool layer can do regardless of what the agent intends. Docker, firewall, process limits, or all three.

3 projects 0 insights 4 variants
TL;DR 7 min read

At some point the agent will do something stupid. Sandboxing is the layer that says but it can’t escape. There are three bounds that matter: filesystem (where can it write), network (which hosts can it reach), processes / syscalls (which kernel calls are permitted). Docker covers all three with one config; process-level sandboxing is faster but trades safety for speed. The interesting question for governance work is which boundary you can prove held for a given session.

Sandboxing

The agent will, at some point, do something stupid. By hallucination, by prompt injection, by genuine task ambiguity. Whatever the cause, the failure mode is inevitable. Sandboxing is the bound that says …but it can’t escape.

flowchart TB
Host[Host machine]
subgraph SB[Sandbox boundary]
  direction TB
  Agent[Agent process]
  Tools[Tool processes]
  FS[Workspace FS]
  Agent --> Tools
  Tools --> FS
end
SB -.->|allowlisted hosts| Net[(Internet)]
SB -.->|read-only| HostFS[Host /etc /home]
Host --> SB
class SB,Agent,Tools,FS sb
Three bounds: filesystem, network, processes. Docker covers all three; layered approaches cover them piecemeal.

Three bounds

Filesystem

Where can the agent read and write? Bound the agent to a workspace directory; mount everything else read-only. A workable convention from the corpus:

/workspace/        # rw, agent-writable, ephemeral
/workspace/repo/   # the user's repo, mounted read-write
/var/agent/state/  # rw, persisted across sessions
/                  # everything else: read-only

The agent’s Read and Write tools enforce paths starting with /workspace. Schema validation (Layer 2 from Guardrails) is the first enforcement; the filesystem mount is the second.

Network

The lever that prevents data exfiltration. Three policy levels:

PolicyWhat it permitsWhat it blocks
Allowlist hostsLLM API, GitHub, npm registry, etc.random internet
Allowlist + DNS via proxyas above + name resolutionDNS exfiltration
Block all egress except via signed proxyidentifies every requestdirect internet

Strix and OpenHands ship default allowlists tuned to their tool needs. Claude Code’s devcontainer reloads a firewall script at startup that locks outbound HTTP to a small set.

Processes / syscalls

Which kernel calls are permitted? Bounded via seccomp / AppArmor profiles, or eliminated outright by a container’s namespace isolation.

The two dominant patterns

Docker container per session

# docker-compose.yml (sketch)
services:
  agent:
    image: agent:latest
    cap_drop: [ALL]
    read_only: true
    tmpfs: ['/tmp', '/workspace']
    networks: [agent_net]
networks:
  agent_net:
    driver_opts:
      com.docker.network.bridge.enable_ip_masquerade: false

Why: simple mental model — one container, one boundary. Cleanup is docker rm. Well-supported by tooling. Cost: container startup is 500ms–2s. Image management is operational overhead. Seen in: Strix, OpenHands.

Process sandbox + firewall (Claude Code)

Run inside a devcontainer (which is itself a Docker boundary, but for the dev environment, not the agent). Inside that, a firewall script restricts outbound HTTP to Anthropic + a small allowlist. Capability flags (NET_ADMIN, NET_RAW) are kept inside the container so the firewall can reload itself; nothing escapes outside.

Why: faster startup, full host I/O performance. Suits a developer-tool model where the user controls the surrounding environment. Cost: the boundary is the devcontainer, not the agent. Less defensible if your threat model is “the agent itself escapes.”

Resource limits

The bounds above are about what; resource limits are about how much.

  • CPU/memory caps. A misbehaving agent shouldn’t take down the host.
  • Disk caps. Especially for tools that download — a runaway pip install can fill /tmp.
  • Per-tool timeouts. Most tools complete in seconds; anything longer hangs the loop.
  • Wall-clock per session. Even a well-behaved agent shouldn’t run for 24 hours; cap at hours.

Browser-only sandbox

For agents that run in-browser (Open Design, some workflow builders), the sandbox is the browser: Web Workers for tool execution, postMessage for IPC, iframe for untrusted preview. No Docker, no firewall — but no kernel access either.

When this is enough: agents that don’t read/write the user’s filesystem and only call APIs. When it’s not: agents that do anything resembling code execution.

Pick a sandbox

? What's your threat model and ops budget?
  • Multi-tenant production, regulated industry Docker per session + outbound allowlist. Audit logs from both layers. AI-Act
  • Single-tenant developer tool Devcontainer + firewall. Faster, cheaper, slightly less isolated.
  • Agent only calls APIs, no code execution Browser sandbox / Web Worker.
  • Prototype on your laptop Skip. Just don't ship it.

Recommended default: For anything customer-facing: Docker per session + outbound allowlist. Treat the cost as a feature, not overhead — your auditor will thank you.

Anti-patterns

  • --privileged in Docker. Defeats the point.
  • Mounting /. Same.
  • Reusing one container across users. Cross-tenant contamination.
  • Allowlist that includes a writable wiki/pastebin. Exfiltration channel.
  • Sandbox the agent but not its tools. A bash tool that can curl arbitrary hosts while the agent’s “process” is contained is a fig leaf.

Projects that implement this

  • OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
  • Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
  • OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.