At some point the agent will do something stupid. Sandboxing is the layer that says but it can’t escape. There are three bounds that matter: filesystem (where can it write), network (which hosts can it reach), processes / syscalls (which kernel calls are permitted). Docker covers all three with one config; process-level sandboxing is faster but trades safety for speed. The interesting question for governance work is which boundary you can prove held for a given session.
Sandboxing
The agent will, at some point, do something stupid. By hallucination, by prompt injection, by genuine task ambiguity. Whatever the cause, the failure mode is inevitable. Sandboxing is the bound that says …but it can’t escape.
flowchart TB Host[Host machine] subgraph SB[Sandbox boundary] direction TB Agent[Agent process] Tools[Tool processes] FS[Workspace FS] Agent --> Tools Tools --> FS end SB -.->|allowlisted hosts| Net[(Internet)] SB -.->|read-only| HostFS[Host /etc /home] Host --> SB class SB,Agent,Tools,FS sb
Three bounds
Filesystem
Where can the agent read and write? Bound the agent to a workspace directory; mount everything else read-only. A workable convention from the corpus:
/workspace/ # rw, agent-writable, ephemeral
/workspace/repo/ # the user's repo, mounted read-write
/var/agent/state/ # rw, persisted across sessions
/ # everything else: read-only
The agent’s Read and Write tools enforce paths starting with /workspace. Schema validation (Layer 2 from Guardrails) is the first enforcement; the filesystem mount is the second.
Network
The lever that prevents data exfiltration. Three policy levels:
| Policy | What it permits | What it blocks |
|---|---|---|
| Allowlist hosts | LLM API, GitHub, npm registry, etc. | random internet |
| Allowlist + DNS via proxy | as above + name resolution | DNS exfiltration |
| Block all egress except via signed proxy | identifies every request | direct internet |
Strix and OpenHands ship default allowlists tuned to their tool needs. Claude Code’s devcontainer reloads a firewall script at startup that locks outbound HTTP to a small set.
Processes / syscalls
Which kernel calls are permitted? Bounded via seccomp / AppArmor profiles, or eliminated outright by a container’s namespace isolation.
The two dominant patterns
Docker container per session
# docker-compose.yml (sketch)
services:
agent:
image: agent:latest
cap_drop: [ALL]
read_only: true
tmpfs: ['/tmp', '/workspace']
networks: [agent_net]
networks:
agent_net:
driver_opts:
com.docker.network.bridge.enable_ip_masquerade: false
Why: simple mental model — one container, one boundary. Cleanup is docker rm. Well-supported by tooling.
Cost: container startup is 500ms–2s. Image management is operational overhead.
Seen in: Strix, OpenHands.
Process sandbox + firewall (Claude Code)
Run inside a devcontainer (which is itself a Docker boundary, but for the dev environment, not the agent). Inside that, a firewall script restricts outbound HTTP to Anthropic + a small allowlist. Capability flags (NET_ADMIN, NET_RAW) are kept inside the container so the firewall can reload itself; nothing escapes outside.
Why: faster startup, full host I/O performance. Suits a developer-tool model where the user controls the surrounding environment. Cost: the boundary is the devcontainer, not the agent. Less defensible if your threat model is “the agent itself escapes.”
Resource limits
The bounds above are about what; resource limits are about how much.
- CPU/memory caps. A misbehaving agent shouldn’t take down the host.
- Disk caps. Especially for tools that download — a runaway pip install can fill
/tmp. - Per-tool timeouts. Most tools complete in seconds; anything longer hangs the loop.
- Wall-clock per session. Even a well-behaved agent shouldn’t run for 24 hours; cap at hours.
Browser-only sandbox
For agents that run in-browser (Open Design, some workflow builders), the sandbox is the browser: Web Workers for tool execution, postMessage for IPC, iframe for untrusted preview. No Docker, no firewall — but no kernel access either.
When this is enough: agents that don’t read/write the user’s filesystem and only call APIs. When it’s not: agents that do anything resembling code execution.
Pick a sandbox
- Multi-tenant production, regulated industry Docker per session + outbound allowlist. Audit logs from both layers. AI-Act
- Single-tenant developer tool Devcontainer + firewall. Faster, cheaper, slightly less isolated.
- Agent only calls APIs, no code execution Browser sandbox / Web Worker.
- Prototype on your laptop Skip. Just don't ship it.
Recommended default: For anything customer-facing: Docker per session + outbound allowlist. Treat the cost as a feature, not overhead — your auditor will thank you.
Anti-patterns
--privilegedin Docker. Defeats the point.- Mounting
/. Same. - Reusing one container across users. Cross-tenant contamination.
- Allowlist that includes a writable wiki/pastebin. Exfiltration channel.
- Sandbox the agent but not its tools. A bash tool that can
curlarbitrary hosts while the agent’s “process” is contained is a fig leaf.
Projects that implement this
- OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
- Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
- OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.