Runtime & Docker Sandbox
Every Strix scan boots a dedicated Docker container that runs the real pentest tooling (Kali + Caido + Playwright). The host process talks to it over HTTP with a bearer token. This doc covers the container lifecycle, the FastAPI tool server, the Kali image, networking, isolation, and failure modes.
1. The Host/Container Split
┌─ Host process ────────────────────────────────────────┐
│ Agent orchestration, LLM calls, state management │
│ + DockerRuntime (container lifecycle) │
│ + httpx.AsyncClient (tool-call transport) │
└───────────────────────────────────────────────────────┘
│ HTTPS POST /execute
│ Authorization: Bearer <token>
▼
┌─ Docker container (strix-scan-<id>) ──────────────────┐
│ Kali rolling + security tools │
│ FastAPI tool_server (port 48081 in container) │
│ Caido proxy (port 48080 in container) │
│ /workspace = target code (tar-uploaded) │
│ pentester user with passwordless sudo │
└───────────────────────────────────────────────────────┘
Security boundary. The container runs with NET_ADMIN + NET_RAW
capabilities (strix/runtime/docker_runtime.py:144) so tools like nmap
can craft raw packets. The host retains API keys, agent state, and
orchestration — a compromise of the target can only reach the sandbox.
Bearer-token auth is enforced on every request to /execute
(runtime/tool_server.py:42-57).
Trade-off. All agents in a single scan share the same container.
Isolation is per-scan, not per-agent. Strix documents this explicitly in
the system prompt (agents/StrixAgent/system_prompt.jinja:233-238).
2. Container Lifecycle
2.1 Creation (strix/runtime/docker_runtime.py:111-173)
flowchart TD
A[DockerRuntime.create_sandbox] --> B{Container already<br/>tagged with scan_id?}
B -- yes --> C[start if stopped,<br/>recover token/ports]
B -- no --> D[_find_available_port × 2<br/>mint 32-byte bearer token]
D --> E[docker.containers.run<br/>sleep infinity]
E --> F[Copy target into /workspace<br/>via put_archive tar upload]
F --> G[_wait_for_tool_server<br/>ping /health up to 30×]
C --> G
G --> H[Return sandbox_info<br/>{sandbox_id, port, token}]Key bits:
- Container name pattern:
strix-scan-{scan_id}(docker_runtime.py:175-220) with a label so re-runs with the same scan_id can find an existing container. - Random host ports mapped to container 48080 (Caido) and 48081
(tool server) —
_find_available_port()(:43). - 32-byte URL-safe token generated in
_tool_server_token(:131); passed to container asTOOL_SERVER_TOKENenv. STRIX_SANDBOX_EXECUTION_TIMEOUTenv var (default 120s,:150) — becomes the per-tool timeout enforced inside the server.- Capabilities:
NET_ADMIN,NET_RAW,host.docker.internalalias for reaching the host. - Up to 2 retries on creation failure with exponential backoff.
- Docker client timeout: 60s (
:23).
2.2 Target Upload (docker_runtime.py:222-248)
_copy_local_directory_to_container tars the local directory in memory,
uploads via put_archive(), then chowns to pentester:pentester.
Multi-source support (:266) — multiple --target flags can each be
uploaded to distinct subdirs under /workspace.
2.3 Startup Sequence Inside the Container
containers/docker-entrypoint.sh runs on container launch:
- Caido starts on port 48080, entrypoint waits for GraphQL readiness (lines 24-46).
- Guest-login token fetched from Caido GraphQL (lines 50-74).
- Temporary Caido project created + selected (lines 79-94).
- System-wide proxy env vars written to
/etc/profile.d/proxy.sh,/etc/environment,/etc/wgetrc, shell RCs (lines 115-144) — every subsequent tool inheritshttp_proxy=http://127.0.0.1:48080. - Caido's self-signed CA certificate imported into the system trust store + Firefox NSS DB (lines 149-152) so HTTPS is transparently decrypted.
- Tool server launched via uvicorn on the configured port with Bearer-token auth (lines 154-180).
/healthpoll before returning (line 169).
2.4 Teardown (docker_runtime.py:334-352)
cleanup() spawns a detached docker rm -f subprocess — non-blocking on
exit. Containers can be manually recovered by scan_id on next run.
3. Tool Server (strix/runtime/tool_server.py)
A small FastAPI app that runs inside the container and dispatches LLM tool calls.
3.1 Endpoints
| Endpoint | Purpose |
|---|---|
POST /execute (:86-127) |
Core dispatch — receives {agent_id, tool_name, kwargs}, looks up the tool in the registry, calls it, returns {result} or {error} |
POST /register_agent (:130-135) |
Lightweight register-presence — used by the agent when a new subagent starts |
GET /health (:138-147) |
Liveness probe — returns {status, sandbox_mode, auth_config, active_agents} |
All endpoints require Authorization: Bearer <TOOL_SERVER_TOKEN>
(:42-57). No endpoint accepts unauthenticated requests.
3.2 Execution Model
await asyncio.wait_for(
asyncio.to_thread(fn, **kwargs),
timeout=REQUEST_TIMEOUT,
)- Runs the tool function in a thread pool — blocking pentest tools (nmap, sqlmap) don't block the event loop.
- Timeout at
REQUEST_TIMEOUT(configured fromSTRIX_SANDBOX_EXECUTION_TIMEOUT, default 120s). - On timeout →
asyncio.CancelledError→ server returns{"error": "Tool timed out after Xs"}. - Per-agent cancellation (
:94-97): if the sameagent_idhits/executewhile a prior call is in-flight, the prior is cancelled. This is how "user hits Escape, kill in-flight tool" works cleanly.
3.3 Transport
Request / response are both JSON. No streaming. The host side waits up
to 150s (120s server timeout + 30s buffer, strix/tools/executor.py:25)
with a 10s connect timeout. Exceptions inside tools are caught and
returned as strings in the response, never as HTTP 500s
(tool_server.py:119-123).
4. The Kali Image
containers/Dockerfile, base kalilinux/kali-rolling:latest.
4.1 Installed Tools (by category)
Network + reconnaissance: nmap (with CAP_NET_RAW/CAP_NET_ADMIN
setcap, Dockerfile:49), ncat, ndiff, dnsutils, whois, naabu,
subfinder.
Web scanning + fuzzing: nuclei, httpx, katana, gospider, ffuf, arjun, dirsearch, wafw00f, wapiti.
Proxy: Caido v0.48.0 with a pre-generated 10-year self-signed CA (built during image build).
Browser: Playwright + Chromium (line 201).
Code analysis: tree-sitter parsers (Java, JS, Python, Go, Bash,
JSON, YAML, TypeScript), semgrep, bandit, retire (JS deps), eslint,
ast-grep (sg).
Secrets: trufflesecurity/trufflehog, gitleaks.
Image/container: trivy.
Custom: JS-Snooper, jsniper, jwt_tool, interactsh-client.
Runtimes: Python 3, uv, Go, Node/npm.
Utilities: tmux, parallel, jq, ripgrep, gdb.
4.2 Packaging
uv sync --frozen --no-dev --extra sandbox(line 200) — only the sandbox-extra Python deps, no dev tools.apt-get clean+rm -rf /var/lib/apt/lists/*after each install block to slim the layer (line 166-168).- Non-root user
pentesterwith passwordless sudo for CLI tools that need it.
4.3 Size
Not aggressively minimized — expect 10–15GB uncompressed. The README notes "first run automatically pulls the sandbox Docker image". On a fast connection that's a 5–10 minute upfront cost.
5. Networking
5.1 Outbound
Container has full internet access via Docker's default bridge network. No egress policy beyond whatever Docker/host firewall provides.
5.2 Proxy Interception
Every tool that honors http_proxy/https_proxy (curl, wget, httpx,
requests, etc.) automatically routes through Caido on 127.0.0.1:48080.
Caido decrypts HTTPS via the pre-installed CA and persists the traffic
to a project DB, which the proxy tool queries via GraphQL.
Playwright / Chromium honor the same env vars plus the NSS cert import, so browser traffic is also intercepted.
5.3 Ports
48080in container → random host port → Caido48081in container → random host port → tool_server
Both bound to 0.0.0.0 inside the container, accessible only to the
host (docker.containers.run(ports=...)).
6. File System
/workspace— target code, tar-uploaded at container start. Multiple--targetpaths each land in a subdir./home/pentester— default home./home/pentester/output— conventional scan-output location (Dockerfile line 17).
Nothing is automatically persisted back to the host mid-run. The host
tracks findings in strix_runs/<run_name>/*.json via the tracer, which
records the agent's LLM conversation — the raw tool output is stored in
those JSON logs rather than synced from the container FS. If the
container is destroyed before the host extracts its files, they're
gone.
7. Multi-Agent Isolation
All subagents reuse the root agent's sandbox. When create_agent
spawns a child, it:
- Copies
sandbox_id,sandbox_token,sandbox_infofrom parent'sAgentStateinto the child's (agents_graph_actions.py:441-461). - Does not call
DockerRuntime.create_sandboxagain. - The child's executor targets the same tool_server instance, with the
same bearer token — the server distinguishes agents by the
agent_idfield in the request body.
Per-agent state inside the container:
- Browser sessions are per-
agent_id(managed bytab_manager.py). - Terminal sessions and Python sessions are keyed by
session_idargument, which is convention-managed by the agents. - Caido project,
/workspace,/tmp— global. Agents can see each other's proxy traffic and files. This is documented as a feature for collaboration, but it also means agents can stomp on each other (overwrite files, close each other's tabs).
8. Failure Modes
| Failure | Behavior |
|---|---|
| Container dies mid-run | Next tool call fails with ConnectError; host re-POSTs on retry; no auto-restart |
| Tool exceeds timeout | asyncio.wait_for raises CancelledError; server returns "Tool timed out after Ns" as the error field; tool continues running in background until container GC |
| Docker daemon unreachable | DockerRuntime raises on create_sandbox(); host exits with a clear message |
| Host process dies | Container keeps running — orphaned. Next scan with same scan_id reuses it (_get_or_create_container). cleanup() spawns a detached docker rm -f subprocess |
| Bearer-token mismatch | 401 from tool_server; host treats as RuntimeError; usually indicates a stale sandbox cache |
| Playwright browser crash | Browser instance auto-relaunched on next browser_action |
| Caido proxy crash | Entrypoint restart pattern — the entrypoint exits, container gets restarted by Docker if policy allows; otherwise traffic capture stops silently |
State recovery: _recover_container_state (docker_runtime.py:72-85)
extracts TOOL_SERVER_TOKEN + port mappings from the container's env /
bindings metadata — so you can reattach to a running container across
host process restarts.
9. Design Observations
Good ideas:
- Kali base image. Rather than cherry-picking tools, just take the canonical pentest distribution. Lower maintenance, broader coverage, familiar to users.
- Transparent Caido interception. Setting system-wide proxy env + installing the CA means the agent doesn't need to know about the proxy — every tool it spawns is auto-captured. Huge DX win; the agent can later ask "show me the last request" without special-casing each tool.
- Bearer token minted per-scan. Not shared, not reused, written only to container env + host memory. Compromise-of-one-scan doesn't compromise others.
- Detached cleanup. Destroying containers on exit is async via subprocess, so the CLI exits promptly.
- FastAPI with a single
/executeendpoint — minimal surface, trivial to audit.
Potential pitfalls:
- Shared container across agents is a DX convenience but a correctness
risk. Two agents both editing
/workspace/src/routes.py, both driving Caido scope changes, both spawning tmux sessions — you can craft pathological patterns where agents collide. A per-agent container option would be useful for high-assurance scans. - Image is heavy. 10–15GB first pull is a real friction point for CI. A stripped "quick mode" image (no full Kali) would help.
- No egress controls by default. A scan that finds an RCE against its target can launch outbound attacks. For paranoid use, users need to wrap the container in a Docker network with egress rules — something Strix could expose as a flag.
- Tool results aren't pulled off the container on demand. If you need a large file (full semgrep JSON, a PCAP) into the LLM context, the tool has to produce it as a string. No file-copy-back tool.
- No checkpoint/resume. If the container dies and you re-run the
same scan_id, the container is reused but the agent's in-process
state is lost (the
events.jsonlsurvives, but replaying it isn't automatic).