Architecture

Strix is a host/sandbox split autonomous-agent framework with three conceptual layers stacked around a classic LLM ReAct loop:

Orchestration layer — agents, state, LLM wrapper, multi-agent graph.
Tool layer — registry + executor that dispatches LLM-emitted tool calls either locally (coordination, thinking) or remotely via HTTP to the sandbox.
Sandbox layer — a single Kali Docker container running a FastAPI tool server plus Caido/Playwright/tmux backing services.

On top of these sits the interface layer (CLI + Textual TUI + streaming parser + tracer) and beside them a prompt/skills layer (markdown playbooks injected into the system prompt).

1. Component Diagram

flowchart TB
    subgraph HOST["Host process (strix CLI)"]
        CLI["strix.interface.main<br/>argparse + config<br/>main.py:540-637"]
        TUI["Textual TUI<br/>interface/tui.py"]
        Tracer["Tracer<br/>telemetry/tracer.py"]
        Agent["StrixAgent<br/>(extends BaseAgent)<br/>agents/base_agent.py"]
        State["AgentState (pydantic)<br/>agents/state.py"]
        LLM["LLM wrapper<br/>llm/llm.py (litellm)"]
        MemC["MemoryCompressor<br/>llm/memory_compressor.py"]
        Dedupe["dedupe.py<br/>LLM-based vuln dedupe"]
        Registry["Tool registry<br/>tools/registry.py"]
        Exec["Tool executor<br/>tools/executor.py"]
        Runtime["DockerRuntime<br/>runtime/docker_runtime.py"]
        Skills[("Skills /<br/>strix/skills/*.md")]
        Prompt[("system_prompt.jinja")]
    end
 
    subgraph BOX["Docker sandbox (strix-scan-<id>)"]
        Server["FastAPI tool_server<br/>runtime/tool_server.py<br/>port 48081"]
        Caido["Caido proxy<br/>port 48080"]
        Tmux["tmux terminal<br/>sessions"]
        Browser["Playwright<br/>Chromium"]
        Py["IPython REPLs"]
        Bin["Kali CLI tools<br/>(nmap, nuclei,<br/>sqlmap, …)"]
        WS[("/workspace<br/>target code")]
    end
 
    CLI --> Tracer
    CLI --> TUI
    CLI --> Agent
    Agent --> State
    Agent --> LLM
    LLM --> MemC
    LLM --> Prompt
    Prompt -.injects.-> Skills
    Agent --> Exec
    Exec --> Registry
    Exec -- local --> Agent
    Exec -- "HTTPS POST /execute<br/>Bearer token" --> Server
    Runtime -. lifecycle .-> BOX
    Server --> Tmux & Browser & Py & Caido
    Caido <-- "transparent<br/>proxy" --> Browser
    Server --> WS
    Bin --> Caido
    Agent --> Dedupe
    Agent --> Tracer

2. Data Flow — One Agent Step

What happens between "user presses Enter" and "next assistant message"?

sequenceDiagram
    actor User
    participant TUI
    participant Agent as StrixAgent.agent_loop
    participant LLM as LLM.generate (streaming)
    participant SP as streaming_parser
    participant Exec as execute_tool
    participant Srv as tool_server (sandbox)
    participant Tool as tool impl
 
    User->>TUI: input
    TUI->>Agent: add user message to state.messages
    loop until completed / max_iters / stop
        Agent->>LLM: messages + system_prompt + compression
        LLM-->>Agent: streamed chunks
        LLM-->>SP: chunks fed to streaming parser (for TUI)
        SP-->>TUI: render partial tool-call as it arrives
        LLM->>LLM: detect </function> → stop streaming early
        LLM->>LLM: parse_tool_invocations() → [{toolName,args}]
        Agent->>Exec: process_tool_invocations(actions, history, state)
        alt local tool (agents_graph, think, finish, load_skill, web_search)
            Exec->>Tool: call in-process
        else sandbox tool
            Exec->>Srv: POST /execute {agent_id, tool_name, kwargs} + Bearer
            Srv->>Tool: asyncio.to_thread(fn, **kwargs)
            Tool-->>Srv: result dict
            Srv-->>Exec: {result} or {error}
        end
        Exec-->>Agent: append <tool_result>…</tool_result> to messages
        Agent->>Agent: state.iteration++
    end
    Agent-->>TUI: scan completed / findings

What that shows:

The agent loop is an async loop driven by the LLM stream (base_agent.py:152-260).
The LLM layer parses tool calls out of text, not from a provider "tool_call" field — this is provider-agnostic (llm/utils.py:80-107).
The executor makes a dynamic host-vs-sandbox routing decision for every call (tools/executor.py:30, 273-277).
Results come back as XML (<tool_result><tool_name>…<result>…</result>) which the next LLM turn can reference verbatim.
Screenshots (browser) are extracted from the dict and attached as vision messages — multimodal tool results (tools/executor.py:227-256).

3. Core Abstractions

`BaseAgent` (`strix/agents/base_agent.py`)

The generic async agent loop. Owns:

state: AgentState — full run state (messages, iteration, todos, notes).
llm: LLM — the litellm wrapper for this agent.
system_prompt: str — rendered jinja, cached, swapped when skills load.
A shared agent-graph module-level registry (_agent_graph, _agent_instances, _agent_states, _agent_messages, base_agent.py:119-150, 456) that is the coordination substrate for multi-agent runs.

`StrixAgent` (`strix/agents/StrixAgent/strix_agent.py`)

Specialization that pins the jinja template and the default skill set. The codebase is structured to allow additional agent types (a directory per agent), but only StrixAgent is shipped.

`AgentState` (`strix/agents/state.py`)

A strict Pydantic model. Not just "last message" — carries full scrollback, tool-call/observation history, errors, todos, waiting-for-input flags, sandbox handle (sandbox_id, sandbox_token, sandbox_info), parent ID for subagents. Serializable via model_dump() for graph storage.

`LLM` (`strix/llm/llm.py`)

Thin wrapper over litellm.acompletion with Strix-specific logic:

Resolves custom model names via STRIX_MODEL_MAP (llm/utils.py:34-44).
Streams; watches the stream for </function> and stops a few chunks after (saves tokens — no point generating after the single allowed tool call).
Parses XML tool calls out of the completed text.
Applies prompt caching (Anthropic ephemeral cache control blocks) on the system message for long-running agents.
Accounts tokens + cost from response.usage and completion_cost().
Hands long histories to MemoryCompressor before every call.

`MemoryCompressor` (`strix/llm/memory_compressor.py`)

Triggers at 90k tokens. Keeps system messages + last 15 turns + most recent 3 images verbatim; older turns are chunked (10 at a time) and summarized by a secondary LLM call using a prompt that explicitly lists what to preserve (vulns, creds, payloads, URLs, error messages).

Tool registry + executor (`strix/tools/`)

Registration is decorator-based: @register_tool(sandbox_execution=...). Decorator side-effects: loads a matching *_schema.xml alongside the module, parses <parameters> for validation, appends to module-level tools list.
Execution is async; picks local vs. remote; runs argument coercion via function-signature inspection; wraps results in XML for the LLM; attaches extracted images.

`DockerRuntime` (`strix/runtime/docker_runtime.py`)

Creates the container, allocates random host ports for the tool server and Caido proxy, mints a per-run 32-byte bearer token, waits for /health, copies target code into /workspace via tar upload. Idempotent — can reuse running containers tagged by scan_id.

Skills (`strix/skills/`)

Markdown files. No Python. Nine categories. Loaded into the system prompt via a jinja {% for skill_name in loaded_skill_names %}… block. See 05_skills_and_prompts.md.

4. Key Architectural Decisions

Host/sandbox split

All offensive tooling runs inside Docker with NET_ADMIN/NET_RAW capabilities (docker_runtime.py:144). The host keeps only API keys, agent state, and orchestration code. A compromised target can attack the container, not the host. However: all agents in a scan share one container, so no per-agent isolation. (runtime/docker_runtime.py)

XML tool calls (not JSON, not provider tool_use)

Works across every provider without relying on native tool support.
Streams gracefully — the parser can start rendering <parameter=x> before the value is complete.
One-tool-per-message enforced by the parser dropping everything after the first </function> (llm/utils.py:64-77).

Skills as markdown (prompt-as-data)

Contributors add attack techniques without touching Python. The same markdown doubles as user documentation. A load_skill tool lets the running agent pull additional playbooks on demand, with a hard cap of 5 per agent to keep context small.

LLM-driven memory compression and dedupe

Rather than heuristic rules, Strix uses secondary LLM calls for:

Compression: summarize old turns while preserving security-critical details (memory_compressor.py:15-43 — full preservation prompt).
Deduplication: decide if two reports share a root cause (llm/dedupe.py:142-213).

This costs tokens but avoids hard-to-tune heuristics in a domain where false-merges and false-splits both hurt.

One-tool-per-message discipline

The system prompt explicitly instructs the LLM to emit exactly one tool call per assistant message. The executor only dispatches the first. This trivializes interleaving LLM reasoning with tool observations and keeps the conversation history linear.

Decorator registration, no import-side effects at runtime

Tools self-register at import time via @register_tool. The central tools/__init__.py imports every tool module, so registration is synchronous at process start. There is no runtime plugin discovery — which keeps startup fast and the tool surface fixed.

5. Concurrency Model

Host: single Python process, asyncio-based. Multi-agent = multiple background asyncio.Task objects, each with its own BaseAgent instance but sharing module-level registries for inter-agent messaging.
Tool execution: each tool call is wrapped in an asyncio.Task so it can be cancelled on user interrupt (base_agent.py:214-230).
Sandbox: FastAPI + uvicorn. Tool calls are dispatched via asyncio.to_thread so blocking CLI tools (nmap, sqlmap) don't block the event loop. Per-agent task cancellation: if the same agent_id hits /execute while a previous call is still running, the prior task is cancelled (tool_server.py:94-97).
TUI: Textual async app; the agent runs in a background thread while Textual polls the tracer every 350ms for UI updates.

6. Execution Flow — Startup to Shutdown

stateDiagram-v2
    [*] --> Parse: main()
    Parse --> Validate: parse_arguments()
    Validate --> Warmup: validate_environment() + LLM "OK" test
    Warmup --> RunGen: generate_run_name()
    RunGen --> Clone: clone repo / collect local sources
    Clone --> DiffScope: resolve_diff_scope_context()
    DiffScope --> Dispatch: interactive?
    Dispatch --> CLI: run_cli (linear output)
    Dispatch --> TUI: run_tui (Textual app)
    CLI --> InitRuntime
    TUI --> InitRuntime
    InitRuntime --> Container: DockerRuntime.create_sandbox()
    Container --> Agent: StrixAgent(config, state, sandbox_info)
    Agent --> Loop
    state Loop {
        [*] --> Check
        Check --> Think: not stopped
        Think --> LLMCall
        LLMCall --> Parse2: stream until </function>
        Parse2 --> Execute: tool call
        Execute --> Record: append result
        Record --> Check
    }
    Loop --> Finish: finish_scan / max_iter / completed
    Finish --> Cleanup: destroy or persist container
    Cleanup --> Flush: tracer.cleanup() + JSONL artifacts
    Flush --> [*]

7. Failure Modes & Observability

Failure	Handling	Where
LLM request error	tenacity-style exp. backoff, max 5 retries, then enters waiting (interactive) or aborts (headless)	`llm/llm.py:156-172`, `base_agent.py:568-601`
Container dies mid-run	next tool call returns `ConnectError`; agent can retry; orphaned containers cleanable via scan_id label	`runtime/docker_runtime.py:175-220`
Tool timeout	120s default, enforced by `asyncio.wait_for` in tool_server; agent sees `"Tool timed out…"`	`runtime/tool_server.py:100-110`
Empty LLM response	executor injects corrective user message telling the agent it must use a tool	`base_agent.py:379-393`
Max iterations (default 300)	warnings at 85% and iter 297; forced into `finish_scan` or `agent_finish`	`base_agent.py:186-211`
User Ctrl+C	Quit modal in TUI; SIGINT handler in CLI; tracer flushes; runtime cleanup on atexit	`interface/tui.py:766-781`, `interface/cli.py:111-125`
Context bloat	compress at 90k tokens, keep last 15 turns + 3 images	`llm/memory_compressor.py:12-13, 208`

Observability: every event lands in strix_runs/<run_id>/events.jsonl via the tracer, and optionally exports as OpenTelemetry spans to Traceloop. PostHog records aggregate scan start/end/finding events.