Runtime & Docker Sandbox

Every Strix scan boots a dedicated Docker container that runs the real pentest tooling (Kali + Caido + Playwright). The host process talks to it over HTTP with a bearer token. This doc covers the container lifecycle, the FastAPI tool server, the Kali image, networking, isolation, and failure modes.

1. The Host/Container Split

┌─ Host process ────────────────────────────────────────┐
│ Agent orchestration, LLM calls, state management       │
│ + DockerRuntime (container lifecycle)                  │
│ + httpx.AsyncClient (tool-call transport)              │
└───────────────────────────────────────────────────────┘
                    │ HTTPS POST /execute
                    │ Authorization: Bearer <token>
                    ▼
┌─ Docker container (strix-scan-<id>) ──────────────────┐
│ Kali rolling + security tools                          │
│ FastAPI tool_server (port 48081 in container)          │
│ Caido proxy (port 48080 in container)                  │
│ /workspace = target code (tar-uploaded)                │
│ pentester user with passwordless sudo                  │
└───────────────────────────────────────────────────────┘

Security boundary. The container runs with NET_ADMIN + NET_RAW capabilities (strix/runtime/docker_runtime.py:144) so tools like nmap can craft raw packets. The host retains API keys, agent state, and orchestration — a compromise of the target can only reach the sandbox. Bearer-token auth is enforced on every request to /execute (runtime/tool_server.py:42-57).

Trade-off. All agents in a single scan share the same container. Isolation is per-scan, not per-agent. Strix documents this explicitly in the system prompt (agents/StrixAgent/system_prompt.jinja:233-238).

2. Container Lifecycle

2.1 Creation (`strix/runtime/docker_runtime.py:111-173`)

flowchart TD
    A[DockerRuntime.create_sandbox] --> B{Container already<br/>tagged with scan_id?}
    B -- yes --> C[start if stopped,<br/>recover token/ports]
    B -- no --> D[_find_available_port × 2<br/>mint 32-byte bearer token]
    D --> E[docker.containers.run<br/>sleep infinity]
    E --> F[Copy target into /workspace<br/>via put_archive tar upload]
    F --> G[_wait_for_tool_server<br/>ping /health up to 30×]
    C --> G
    G --> H[Return sandbox_info<br/>{sandbox_id, port, token}]

Key bits:

Container name pattern: strix-scan-{scan_id} (docker_runtime.py:175-220) with a label so re-runs with the same scan_id can find an existing container.
Random host ports mapped to container 48080 (Caido) and 48081 (tool server) — _find_available_port() (:43).
32-byte URL-safe token generated in _tool_server_token (:131); passed to container as TOOL_SERVER_TOKEN env.
STRIX_SANDBOX_EXECUTION_TIMEOUT env var (default 120s, :150) — becomes the per-tool timeout enforced inside the server.
Capabilities: NET_ADMIN, NET_RAW, host.docker.internal alias for reaching the host.
Up to 2 retries on creation failure with exponential backoff.
Docker client timeout: 60s (:23).

2.2 Target Upload (`docker_runtime.py:222-248`)

_copy_local_directory_to_container tars the local directory in memory, uploads via put_archive(), then chowns to pentester:pentester. Multi-source support (:266) — multiple --target flags can each be uploaded to distinct subdirs under /workspace.

2.3 Startup Sequence Inside the Container

containers/docker-entrypoint.sh runs on container launch:

Caido starts on port 48080, entrypoint waits for GraphQL readiness (lines 24-46).
Guest-login token fetched from Caido GraphQL (lines 50-74).
Temporary Caido project created + selected (lines 79-94).
System-wide proxy env vars written to /etc/profile.d/proxy.sh, /etc/environment, /etc/wgetrc, shell RCs (lines 115-144) — every subsequent tool inherits http_proxy=http://127.0.0.1:48080.
Caido's self-signed CA certificate imported into the system trust store + Firefox NSS DB (lines 149-152) so HTTPS is transparently decrypted.
Tool server launched via uvicorn on the configured port with Bearer-token auth (lines 154-180).
/health poll before returning (line 169).

2.4 Teardown (`docker_runtime.py:334-352`)

cleanup() spawns a detached docker rm -f subprocess — non-blocking on exit. Containers can be manually recovered by scan_id on next run.

3. Tool Server (`strix/runtime/tool_server.py`)

A small FastAPI app that runs inside the container and dispatches LLM tool calls.

3.1 Endpoints

Endpoint	Purpose
`POST /execute` (`:86-127`)	Core dispatch — receives `{agent_id, tool_name, kwargs}`, looks up the tool in the registry, calls it, returns `{result}` or `{error}`
`POST /register_agent` (`:130-135`)	Lightweight register-presence — used by the agent when a new subagent starts
`GET /health` (`:138-147`)	Liveness probe — returns `{status, sandbox_mode, auth_config, active_agents}`

All endpoints require Authorization: Bearer <TOOL_SERVER_TOKEN> (:42-57). No endpoint accepts unauthenticated requests.

3.2 Execution Model

await asyncio.wait_for(
    asyncio.to_thread(fn, **kwargs),
    timeout=REQUEST_TIMEOUT,
)

Runs the tool function in a thread pool — blocking pentest tools (nmap, sqlmap) don't block the event loop.
Timeout at REQUEST_TIMEOUT (configured from STRIX_SANDBOX_EXECUTION_TIMEOUT, default 120s).
On timeout → asyncio.CancelledError → server returns {"error": "Tool timed out after Xs"}.
Per-agent cancellation (:94-97): if the same agent_id hits /execute while a prior call is in-flight, the prior is cancelled. This is how "user hits Escape, kill in-flight tool" works cleanly.

3.3 Transport

Request / response are both JSON. No streaming. The host side waits up to 150s (120s server timeout + 30s buffer, strix/tools/executor.py:25) with a 10s connect timeout. Exceptions inside tools are caught and returned as strings in the response, never as HTTP 500s (tool_server.py:119-123).

4. The Kali Image

containers/Dockerfile, base kalilinux/kali-rolling:latest.

4.1 Installed Tools (by category)

Network + reconnaissance: nmap (with CAP_NET_RAW/CAP_NET_ADMIN setcap, Dockerfile:49), ncat, ndiff, dnsutils, whois, naabu, subfinder.

Web scanning + fuzzing: nuclei, httpx, katana, gospider, ffuf, arjun, dirsearch, wafw00f, wapiti.

Proxy: Caido v0.48.0 with a pre-generated 10-year self-signed CA (built during image build).

Browser: Playwright + Chromium (line 201).

Code analysis: tree-sitter parsers (Java, JS, Python, Go, Bash, JSON, YAML, TypeScript), semgrep, bandit, retire (JS deps), eslint, ast-grep (sg).

Secrets: trufflesecurity/trufflehog, gitleaks.

Image/container: trivy.

Custom: JS-Snooper, jsniper, jwt_tool, interactsh-client.

Runtimes: Python 3, uv, Go, Node/npm.

Utilities: tmux, parallel, jq, ripgrep, gdb.

4.2 Packaging

uv sync --frozen --no-dev --extra sandbox (line 200) — only the sandbox-extra Python deps, no dev tools.
apt-get clean + rm -rf /var/lib/apt/lists/* after each install block to slim the layer (line 166-168).
Non-root user pentester with passwordless sudo for CLI tools that need it.

4.3 Size

Not aggressively minimized — expect 10–15GB uncompressed. The README notes "first run automatically pulls the sandbox Docker image". On a fast connection that's a 5–10 minute upfront cost.

5. Networking

5.1 Outbound

Container has full internet access via Docker's default bridge network. No egress policy beyond whatever Docker/host firewall provides.

5.2 Proxy Interception

Every tool that honors http_proxy/https_proxy (curl, wget, httpx, requests, etc.) automatically routes through Caido on 127.0.0.1:48080. Caido decrypts HTTPS via the pre-installed CA and persists the traffic to a project DB, which the proxy tool queries via GraphQL.

Playwright / Chromium honor the same env vars plus the NSS cert import, so browser traffic is also intercepted.

5.3 Ports

48080 in container → random host port → Caido
48081 in container → random host port → tool_server

Both bound to 0.0.0.0 inside the container, accessible only to the host (docker.containers.run(ports=...)).

6. File System

/workspace — target code, tar-uploaded at container start. Multiple --target paths each land in a subdir.
/home/pentester — default home.
/home/pentester/output — conventional scan-output location (Dockerfile line 17).

Nothing is automatically persisted back to the host mid-run. The host tracks findings in strix_runs/<run_name>/*.json via the tracer, which records the agent's LLM conversation — the raw tool output is stored in those JSON logs rather than synced from the container FS. If the container is destroyed before the host extracts its files, they're gone.

7. Multi-Agent Isolation

All subagents reuse the root agent's sandbox. When create_agent spawns a child, it:

Copies sandbox_id, sandbox_token, sandbox_info from parent's AgentState into the child's (agents_graph_actions.py:441-461).
Does not call DockerRuntime.create_sandbox again.
The child's executor targets the same tool_server instance, with the same bearer token — the server distinguishes agents by the agent_id field in the request body.

Per-agent state inside the container:

Browser sessions are per-agent_id (managed by tab_manager.py).
Terminal sessions and Python sessions are keyed by session_id argument, which is convention-managed by the agents.
Caido project, /workspace, /tmp — global. Agents can see each other's proxy traffic and files. This is documented as a feature for collaboration, but it also means agents can stomp on each other (overwrite files, close each other's tabs).

8. Failure Modes

Failure	Behavior
Container dies mid-run	Next tool call fails with `ConnectError`; host re-POSTs on retry; no auto-restart
Tool exceeds timeout	`asyncio.wait_for` raises `CancelledError`; server returns `"Tool timed out after Ns"` as the `error` field; tool continues running in background until container GC
Docker daemon unreachable	`DockerRuntime` raises on `create_sandbox()`; host exits with a clear message
Host process dies	Container keeps running — orphaned. Next scan with same scan_id reuses it (`_get_or_create_container`). `cleanup()` spawns a detached `docker rm -f` subprocess
Bearer-token mismatch	401 from tool_server; host treats as `RuntimeError`; usually indicates a stale sandbox cache
Playwright browser crash	Browser instance auto-relaunched on next `browser_action`
Caido proxy crash	Entrypoint restart pattern — the entrypoint exits, container gets restarted by Docker if policy allows; otherwise traffic capture stops silently

State recovery: _recover_container_state (docker_runtime.py:72-85) extracts TOOL_SERVER_TOKEN + port mappings from the container's env / bindings metadata — so you can reattach to a running container across host process restarts.

9. Design Observations

Good ideas:

Kali base image. Rather than cherry-picking tools, just take the canonical pentest distribution. Lower maintenance, broader coverage, familiar to users.
Transparent Caido interception. Setting system-wide proxy env + installing the CA means the agent doesn't need to know about the proxy — every tool it spawns is auto-captured. Huge DX win; the agent can later ask "show me the last request" without special-casing each tool.
Bearer token minted per-scan. Not shared, not reused, written only to container env + host memory. Compromise-of-one-scan doesn't compromise others.
Detached cleanup. Destroying containers on exit is async via subprocess, so the CLI exits promptly.
FastAPI with a single /execute endpoint — minimal surface, trivial to audit.

Potential pitfalls:

Shared container across agents is a DX convenience but a correctness risk. Two agents both editing /workspace/src/routes.py, both driving Caido scope changes, both spawning tmux sessions — you can craft pathological patterns where agents collide. A per-agent container option would be useful for high-assurance scans.
Image is heavy. 10–15GB first pull is a real friction point for CI. A stripped "quick mode" image (no full Kali) would help.
No egress controls by default. A scan that finds an RCE against its target can launch outbound attacks. For paranoid use, users need to wrap the container in a Docker network with egress rules — something Strix could expose as a flag.
Tool results aren't pulled off the container on demand. If you need a large file (full semgrep JSON, a PCAP) into the LLM context, the tool has to produce it as a string. No file-copy-back tool.
No checkpoint/resume. If the container dies and you re-run the same scan_id, the container is reused but the agent's in-process state is lost (the events.jsonl survives, but replaying it isn't automatic).