Architecture
Strix is a host/sandbox split autonomous-agent framework with three conceptual layers stacked around a classic LLM ReAct loop:
- Orchestration layer — agents, state, LLM wrapper, multi-agent graph.
- Tool layer — registry + executor that dispatches LLM-emitted tool calls either locally (coordination, thinking) or remotely via HTTP to the sandbox.
- Sandbox layer — a single Kali Docker container running a FastAPI tool server plus Caido/Playwright/tmux backing services.
On top of these sits the interface layer (CLI + Textual TUI + streaming parser + tracer) and beside them a prompt/skills layer (markdown playbooks injected into the system prompt).
1. Component Diagram
flowchart TB
subgraph HOST["Host process (strix CLI)"]
CLI["strix.interface.main<br/>argparse + config<br/>main.py:540-637"]
TUI["Textual TUI<br/>interface/tui.py"]
Tracer["Tracer<br/>telemetry/tracer.py"]
Agent["StrixAgent<br/>(extends BaseAgent)<br/>agents/base_agent.py"]
State["AgentState (pydantic)<br/>agents/state.py"]
LLM["LLM wrapper<br/>llm/llm.py (litellm)"]
MemC["MemoryCompressor<br/>llm/memory_compressor.py"]
Dedupe["dedupe.py<br/>LLM-based vuln dedupe"]
Registry["Tool registry<br/>tools/registry.py"]
Exec["Tool executor<br/>tools/executor.py"]
Runtime["DockerRuntime<br/>runtime/docker_runtime.py"]
Skills[("Skills /<br/>strix/skills/*.md")]
Prompt[("system_prompt.jinja")]
end
subgraph BOX["Docker sandbox (strix-scan-<id>)"]
Server["FastAPI tool_server<br/>runtime/tool_server.py<br/>port 48081"]
Caido["Caido proxy<br/>port 48080"]
Tmux["tmux terminal<br/>sessions"]
Browser["Playwright<br/>Chromium"]
Py["IPython REPLs"]
Bin["Kali CLI tools<br/>(nmap, nuclei,<br/>sqlmap, …)"]
WS[("/workspace<br/>target code")]
end
CLI --> Tracer
CLI --> TUI
CLI --> Agent
Agent --> State
Agent --> LLM
LLM --> MemC
LLM --> Prompt
Prompt -.injects.-> Skills
Agent --> Exec
Exec --> Registry
Exec -- local --> Agent
Exec -- "HTTPS POST /execute<br/>Bearer token" --> Server
Runtime -. lifecycle .-> BOX
Server --> Tmux & Browser & Py & Caido
Caido <-- "transparent<br/>proxy" --> Browser
Server --> WS
Bin --> Caido
Agent --> Dedupe
Agent --> Tracer2. Data Flow — One Agent Step
What happens between "user presses Enter" and "next assistant message"?
sequenceDiagram
actor User
participant TUI
participant Agent as StrixAgent.agent_loop
participant LLM as LLM.generate (streaming)
participant SP as streaming_parser
participant Exec as execute_tool
participant Srv as tool_server (sandbox)
participant Tool as tool impl
User->>TUI: input
TUI->>Agent: add user message to state.messages
loop until completed / max_iters / stop
Agent->>LLM: messages + system_prompt + compression
LLM-->>Agent: streamed chunks
LLM-->>SP: chunks fed to streaming parser (for TUI)
SP-->>TUI: render partial tool-call as it arrives
LLM->>LLM: detect </function> → stop streaming early
LLM->>LLM: parse_tool_invocations() → [{toolName,args}]
Agent->>Exec: process_tool_invocations(actions, history, state)
alt local tool (agents_graph, think, finish, load_skill, web_search)
Exec->>Tool: call in-process
else sandbox tool
Exec->>Srv: POST /execute {agent_id, tool_name, kwargs} + Bearer
Srv->>Tool: asyncio.to_thread(fn, **kwargs)
Tool-->>Srv: result dict
Srv-->>Exec: {result} or {error}
end
Exec-->>Agent: append <tool_result>…</tool_result> to messages
Agent->>Agent: state.iteration++
end
Agent-->>TUI: scan completed / findingsWhat that shows:
- The agent loop is an async loop driven by the LLM stream
(
base_agent.py:152-260). - The LLM layer parses tool calls out of text, not from a provider "tool_call"
field — this is provider-agnostic (
llm/utils.py:80-107). - The executor makes a dynamic host-vs-sandbox routing decision for every call
(
tools/executor.py:30, 273-277). - Results come back as XML (
<tool_result><tool_name>…<result>…</result>) which the next LLM turn can reference verbatim. - Screenshots (browser) are extracted from the dict and attached as vision
messages — multimodal tool results (
tools/executor.py:227-256).
3. Core Abstractions
BaseAgent (strix/agents/base_agent.py)
The generic async agent loop. Owns:
state: AgentState— full run state (messages, iteration, todos, notes).llm: LLM— the litellm wrapper for this agent.system_prompt: str— rendered jinja, cached, swapped when skills load.- A shared agent-graph module-level registry (
_agent_graph,_agent_instances,_agent_states,_agent_messages,base_agent.py:119-150, 456) that is the coordination substrate for multi-agent runs.
StrixAgent (strix/agents/StrixAgent/strix_agent.py)
Specialization that pins the jinja template and the default skill set. The
codebase is structured to allow additional agent types (a directory per
agent), but only StrixAgent is shipped.
AgentState (strix/agents/state.py)
A strict Pydantic model. Not just "last message" — carries full scrollback,
tool-call/observation history, errors, todos, waiting-for-input flags,
sandbox handle (sandbox_id, sandbox_token, sandbox_info), parent ID
for subagents. Serializable via model_dump() for graph storage.
LLM (strix/llm/llm.py)
Thin wrapper over litellm.acompletion with Strix-specific logic:
- Resolves custom model names via
STRIX_MODEL_MAP(llm/utils.py:34-44). - Streams; watches the stream for
</function>and stops a few chunks after (saves tokens — no point generating after the single allowed tool call). - Parses XML tool calls out of the completed text.
- Applies prompt caching (Anthropic ephemeral cache control blocks) on the system message for long-running agents.
- Accounts tokens + cost from
response.usageandcompletion_cost(). - Hands long histories to
MemoryCompressorbefore every call.
MemoryCompressor (strix/llm/memory_compressor.py)
Triggers at 90k tokens. Keeps system messages + last 15 turns + most recent 3 images verbatim; older turns are chunked (10 at a time) and summarized by a secondary LLM call using a prompt that explicitly lists what to preserve (vulns, creds, payloads, URLs, error messages).
Tool registry + executor (strix/tools/)
- Registration is decorator-based:
@register_tool(sandbox_execution=...). Decorator side-effects: loads a matching*_schema.xmlalongside the module, parses<parameters>for validation, appends to module-leveltoolslist. - Execution is async; picks local vs. remote; runs argument coercion via function-signature inspection; wraps results in XML for the LLM; attaches extracted images.
DockerRuntime (strix/runtime/docker_runtime.py)
Creates the container, allocates random host ports for the tool server and
Caido proxy, mints a per-run 32-byte bearer token, waits for /health,
copies target code into /workspace via tar upload. Idempotent — can reuse
running containers tagged by scan_id.
Skills (strix/skills/)
Markdown files. No Python. Nine categories. Loaded into the system prompt
via a jinja {% for skill_name in loaded_skill_names %}… block. See
05_skills_and_prompts.md.
4. Key Architectural Decisions
Host/sandbox split
All offensive tooling runs inside Docker with NET_ADMIN/NET_RAW
capabilities (docker_runtime.py:144). The host keeps only API keys, agent
state, and orchestration code. A compromised target can attack the
container, not the host. However: all agents in a scan share one
container, so no per-agent isolation. (runtime/docker_runtime.py)
XML tool calls (not JSON, not provider tool_use)
- Works across every provider without relying on native tool support.
- Streams gracefully — the parser can start rendering
<parameter=x>before the value is complete. - One-tool-per-message enforced by the parser dropping everything after the
first
</function>(llm/utils.py:64-77).
Skills as markdown (prompt-as-data)
Contributors add attack techniques without touching Python. The same markdown
doubles as user documentation. A load_skill tool lets the running agent
pull additional playbooks on demand, with a hard cap of 5 per agent to keep
context small.
LLM-driven memory compression and dedupe
Rather than heuristic rules, Strix uses secondary LLM calls for:
- Compression: summarize old turns while preserving security-critical
details (
memory_compressor.py:15-43— full preservation prompt). - Deduplication: decide if two reports share a root cause
(
llm/dedupe.py:142-213).
This costs tokens but avoids hard-to-tune heuristics in a domain where false-merges and false-splits both hurt.
One-tool-per-message discipline
The system prompt explicitly instructs the LLM to emit exactly one tool call per assistant message. The executor only dispatches the first. This trivializes interleaving LLM reasoning with tool observations and keeps the conversation history linear.
Decorator registration, no import-side effects at runtime
Tools self-register at import time via @register_tool. The central
tools/__init__.py imports every tool module, so registration is
synchronous at process start. There is no runtime plugin discovery — which
keeps startup fast and the tool surface fixed.
5. Concurrency Model
- Host: single Python process,
asyncio-based. Multi-agent = multiple backgroundasyncio.Taskobjects, each with its ownBaseAgentinstance but sharing module-level registries for inter-agent messaging. - Tool execution: each tool call is wrapped in an
asyncio.Taskso it can be cancelled on user interrupt (base_agent.py:214-230). - Sandbox: FastAPI + uvicorn. Tool calls are dispatched via
asyncio.to_threadso blocking CLI tools (nmap, sqlmap) don't block the event loop. Per-agent task cancellation: if the sameagent_idhits/executewhile a previous call is still running, the prior task is cancelled (tool_server.py:94-97). - TUI: Textual async app; the agent runs in a background thread while Textual polls the tracer every 350ms for UI updates.
6. Execution Flow — Startup to Shutdown
stateDiagram-v2
[*] --> Parse: main()
Parse --> Validate: parse_arguments()
Validate --> Warmup: validate_environment() + LLM "OK" test
Warmup --> RunGen: generate_run_name()
RunGen --> Clone: clone repo / collect local sources
Clone --> DiffScope: resolve_diff_scope_context()
DiffScope --> Dispatch: interactive?
Dispatch --> CLI: run_cli (linear output)
Dispatch --> TUI: run_tui (Textual app)
CLI --> InitRuntime
TUI --> InitRuntime
InitRuntime --> Container: DockerRuntime.create_sandbox()
Container --> Agent: StrixAgent(config, state, sandbox_info)
Agent --> Loop
state Loop {
[*] --> Check
Check --> Think: not stopped
Think --> LLMCall
LLMCall --> Parse2: stream until </function>
Parse2 --> Execute: tool call
Execute --> Record: append result
Record --> Check
}
Loop --> Finish: finish_scan / max_iter / completed
Finish --> Cleanup: destroy or persist container
Cleanup --> Flush: tracer.cleanup() + JSONL artifacts
Flush --> [*]7. Failure Modes & Observability
| Failure | Handling | Where |
|---|---|---|
| LLM request error | tenacity-style exp. backoff, max 5 retries, then enters waiting (interactive) or aborts (headless) | llm/llm.py:156-172, base_agent.py:568-601 |
| Container dies mid-run | next tool call returns ConnectError; agent can retry; orphaned containers cleanable via scan_id label |
runtime/docker_runtime.py:175-220 |
| Tool timeout | 120s default, enforced by asyncio.wait_for in tool_server; agent sees "Tool timed out…" |
runtime/tool_server.py:100-110 |
| Empty LLM response | executor injects corrective user message telling the agent it must use a tool | base_agent.py:379-393 |
| Max iterations (default 300) | warnings at 85% and iter 297; forced into finish_scan or agent_finish |
base_agent.py:186-211 |
| User Ctrl+C | Quit modal in TUI; SIGINT handler in CLI; tracer flushes; runtime cleanup on atexit | interface/tui.py:766-781, interface/cli.py:111-125 |
| Context bloat | compress at 90k tokens, keep last 15 turns + 3 images | llm/memory_compressor.py:12-13, 208 |
Observability: every event lands in strix_runs/<run_id>/events.jsonl via
the tracer, and optionally exports as OpenTelemetry spans to Traceloop.
PostHog records aggregate scan start/end/finding events.