CodeDocs Vault

Architecture

Strix is a host/sandbox split autonomous-agent framework with three conceptual layers stacked around a classic LLM ReAct loop:

  1. Orchestration layer — agents, state, LLM wrapper, multi-agent graph.
  2. Tool layer — registry + executor that dispatches LLM-emitted tool calls either locally (coordination, thinking) or remotely via HTTP to the sandbox.
  3. Sandbox layer — a single Kali Docker container running a FastAPI tool server plus Caido/Playwright/tmux backing services.

On top of these sits the interface layer (CLI + Textual TUI + streaming parser + tracer) and beside them a prompt/skills layer (markdown playbooks injected into the system prompt).


1. Component Diagram

flowchart TB
    subgraph HOST["Host process (strix CLI)"]
        CLI["strix.interface.main<br/>argparse + config<br/>main.py:540-637"]
        TUI["Textual TUI<br/>interface/tui.py"]
        Tracer["Tracer<br/>telemetry/tracer.py"]
        Agent["StrixAgent<br/>(extends BaseAgent)<br/>agents/base_agent.py"]
        State["AgentState (pydantic)<br/>agents/state.py"]
        LLM["LLM wrapper<br/>llm/llm.py (litellm)"]
        MemC["MemoryCompressor<br/>llm/memory_compressor.py"]
        Dedupe["dedupe.py<br/>LLM-based vuln dedupe"]
        Registry["Tool registry<br/>tools/registry.py"]
        Exec["Tool executor<br/>tools/executor.py"]
        Runtime["DockerRuntime<br/>runtime/docker_runtime.py"]
        Skills[("Skills /<br/>strix/skills/*.md")]
        Prompt[("system_prompt.jinja")]
    end
 
    subgraph BOX["Docker sandbox (strix-scan-<id>)"]
        Server["FastAPI tool_server<br/>runtime/tool_server.py<br/>port 48081"]
        Caido["Caido proxy<br/>port 48080"]
        Tmux["tmux terminal<br/>sessions"]
        Browser["Playwright<br/>Chromium"]
        Py["IPython REPLs"]
        Bin["Kali CLI tools<br/>(nmap, nuclei,<br/>sqlmap, …)"]
        WS[("/workspace<br/>target code")]
    end
 
    CLI --> Tracer
    CLI --> TUI
    CLI --> Agent
    Agent --> State
    Agent --> LLM
    LLM --> MemC
    LLM --> Prompt
    Prompt -.injects.-> Skills
    Agent --> Exec
    Exec --> Registry
    Exec -- local --> Agent
    Exec -- "HTTPS POST /execute<br/>Bearer token" --> Server
    Runtime -. lifecycle .-> BOX
    Server --> Tmux & Browser & Py & Caido
    Caido <-- "transparent<br/>proxy" --> Browser
    Server --> WS
    Bin --> Caido
    Agent --> Dedupe
    Agent --> Tracer

2. Data Flow — One Agent Step

What happens between "user presses Enter" and "next assistant message"?

sequenceDiagram
    actor User
    participant TUI
    participant Agent as StrixAgent.agent_loop
    participant LLM as LLM.generate (streaming)
    participant SP as streaming_parser
    participant Exec as execute_tool
    participant Srv as tool_server (sandbox)
    participant Tool as tool impl
 
    User->>TUI: input
    TUI->>Agent: add user message to state.messages
    loop until completed / max_iters / stop
        Agent->>LLM: messages + system_prompt + compression
        LLM-->>Agent: streamed chunks
        LLM-->>SP: chunks fed to streaming parser (for TUI)
        SP-->>TUI: render partial tool-call as it arrives
        LLM->>LLM: detect </function> → stop streaming early
        LLM->>LLM: parse_tool_invocations() → [{toolName,args}]
        Agent->>Exec: process_tool_invocations(actions, history, state)
        alt local tool (agents_graph, think, finish, load_skill, web_search)
            Exec->>Tool: call in-process
        else sandbox tool
            Exec->>Srv: POST /execute {agent_id, tool_name, kwargs} + Bearer
            Srv->>Tool: asyncio.to_thread(fn, **kwargs)
            Tool-->>Srv: result dict
            Srv-->>Exec: {result} or {error}
        end
        Exec-->>Agent: append <tool_result>…</tool_result> to messages
        Agent->>Agent: state.iteration++
    end
    Agent-->>TUI: scan completed / findings

What that shows:


3. Core Abstractions

BaseAgent (strix/agents/base_agent.py)

The generic async agent loop. Owns:

StrixAgent (strix/agents/StrixAgent/strix_agent.py)

Specialization that pins the jinja template and the default skill set. The codebase is structured to allow additional agent types (a directory per agent), but only StrixAgent is shipped.

AgentState (strix/agents/state.py)

A strict Pydantic model. Not just "last message" — carries full scrollback, tool-call/observation history, errors, todos, waiting-for-input flags, sandbox handle (sandbox_id, sandbox_token, sandbox_info), parent ID for subagents. Serializable via model_dump() for graph storage.

LLM (strix/llm/llm.py)

Thin wrapper over litellm.acompletion with Strix-specific logic:

MemoryCompressor (strix/llm/memory_compressor.py)

Triggers at 90k tokens. Keeps system messages + last 15 turns + most recent 3 images verbatim; older turns are chunked (10 at a time) and summarized by a secondary LLM call using a prompt that explicitly lists what to preserve (vulns, creds, payloads, URLs, error messages).

Tool registry + executor (strix/tools/)

DockerRuntime (strix/runtime/docker_runtime.py)

Creates the container, allocates random host ports for the tool server and Caido proxy, mints a per-run 32-byte bearer token, waits for /health, copies target code into /workspace via tar upload. Idempotent — can reuse running containers tagged by scan_id.

Skills (strix/skills/)

Markdown files. No Python. Nine categories. Loaded into the system prompt via a jinja {% for skill_name in loaded_skill_names %}… block. See 05_skills_and_prompts.md.


4. Key Architectural Decisions

Host/sandbox split

All offensive tooling runs inside Docker with NET_ADMIN/NET_RAW capabilities (docker_runtime.py:144). The host keeps only API keys, agent state, and orchestration code. A compromised target can attack the container, not the host. However: all agents in a scan share one container, so no per-agent isolation. (runtime/docker_runtime.py)

XML tool calls (not JSON, not provider tool_use)

Skills as markdown (prompt-as-data)

Contributors add attack techniques without touching Python. The same markdown doubles as user documentation. A load_skill tool lets the running agent pull additional playbooks on demand, with a hard cap of 5 per agent to keep context small.

LLM-driven memory compression and dedupe

Rather than heuristic rules, Strix uses secondary LLM calls for:

This costs tokens but avoids hard-to-tune heuristics in a domain where false-merges and false-splits both hurt.

One-tool-per-message discipline

The system prompt explicitly instructs the LLM to emit exactly one tool call per assistant message. The executor only dispatches the first. This trivializes interleaving LLM reasoning with tool observations and keeps the conversation history linear.

Decorator registration, no import-side effects at runtime

Tools self-register at import time via @register_tool. The central tools/__init__.py imports every tool module, so registration is synchronous at process start. There is no runtime plugin discovery — which keeps startup fast and the tool surface fixed.


5. Concurrency Model


6. Execution Flow — Startup to Shutdown

stateDiagram-v2
    [*] --> Parse: main()
    Parse --> Validate: parse_arguments()
    Validate --> Warmup: validate_environment() + LLM "OK" test
    Warmup --> RunGen: generate_run_name()
    RunGen --> Clone: clone repo / collect local sources
    Clone --> DiffScope: resolve_diff_scope_context()
    DiffScope --> Dispatch: interactive?
    Dispatch --> CLI: run_cli (linear output)
    Dispatch --> TUI: run_tui (Textual app)
    CLI --> InitRuntime
    TUI --> InitRuntime
    InitRuntime --> Container: DockerRuntime.create_sandbox()
    Container --> Agent: StrixAgent(config, state, sandbox_info)
    Agent --> Loop
    state Loop {
        [*] --> Check
        Check --> Think: not stopped
        Think --> LLMCall
        LLMCall --> Parse2: stream until </function>
        Parse2 --> Execute: tool call
        Execute --> Record: append result
        Record --> Check
    }
    Loop --> Finish: finish_scan / max_iter / completed
    Finish --> Cleanup: destroy or persist container
    Cleanup --> Flush: tracer.cleanup() + JSONL artifacts
    Flush --> [*]

7. Failure Modes & Observability

Failure Handling Where
LLM request error tenacity-style exp. backoff, max 5 retries, then enters waiting (interactive) or aborts (headless) llm/llm.py:156-172, base_agent.py:568-601
Container dies mid-run next tool call returns ConnectError; agent can retry; orphaned containers cleanable via scan_id label runtime/docker_runtime.py:175-220
Tool timeout 120s default, enforced by asyncio.wait_for in tool_server; agent sees "Tool timed out…" runtime/tool_server.py:100-110
Empty LLM response executor injects corrective user message telling the agent it must use a tool base_agent.py:379-393
Max iterations (default 300) warnings at 85% and iter 297; forced into finish_scan or agent_finish base_agent.py:186-211
User Ctrl+C Quit modal in TUI; SIGINT handler in CLI; tracer flushes; runtime cleanup on atexit interface/tui.py:766-781, interface/cli.py:111-125
Context bloat compress at 90k tokens, keep last 15 turns + 3 images llm/memory_compressor.py:12-13, 208

Observability: every event lands in strix_runs/<run_id>/events.jsonl via the tracer, and optionally exports as OpenTelemetry spans to Traceloop. PostHog records aggregate scan start/end/finding events.