Strix — Repository Analysis

Generated walkthrough of usestrix/strix (commit 15c9571) — the open-source "AI hacker" agent framework. All line numbers below correspond to the tree at the time of writing.

This directory contains a layered tour of the codebase. Read in order, or jump to a specific subsystem.

#	File	Scope
00	`00_overview.md`	Purpose, problem, users, tech stack (this file)
01	`01_architecture.md`	High-level architecture, components, data flow
02	`02_agent_and_llm.md`	Agent loop, state, LLM wrapper, memory compression, dedupe
03	`03_tools.md`	Tool registry, executor, XML schema contract, per-tool map
04	`04_runtime_sandbox.md`	Docker sandbox, FastAPI tool server, transport, isolation
05	`05_skills_and_prompts.md`	Skills taxonomy, scan modes, system prompt, prompt engineering
06	`06_interface_cli_tui.md`	CLI entry point, Textual TUI, streaming parser, artifacts
07	`07_config_telemetry_packaging.md`	Config, telemetry, packaging, tests, CI
08	`08_llm_leverage_patterns.md`	Synthesis: patterns and techniques for leveraging LLMs
09	`09_file_map.md`	File-to-responsibility map for quick navigation

1. Purpose & Problem

What it is. Strix is an autonomous multi-agent framework that behaves as an "AI pentester" — it takes a target (GitHub repo, local codebase, URL, domain, or IP), spins up a sandboxed Kali container full of real offensive tools, and iteratively probes the target through an LLM-driven agent loop until it has collected validated vulnerability findings with working proof-of-concepts (README.md:43-52).

What it solves. Two adjacent pain points:

Manual pentesting is slow and expensive. A real pentest takes weeks. Strix aims to reduce that to hours (README.md:67-69).
Static analysis is noisy. SAST tools generate false positives because they can't run the code. Strix validates findings through actual exploitation, not just pattern matching (README.md:50-52, 82-86).

The headline metric: 96% solve rate on the XBEN CTF benchmark (100/104 challenges), avg ~19 min / ~$3.37 per challenge (benchmarks/README.md).

Who it's for.

Application-security teams needing continuous testing
Developers integrating security checks into CI/CD (GitHub Actions first-class, README.md:193-224)
Bug-bounty hunters automating recon + PoC generation
Security researchers doing rapid assessment

The project is Apache-2.0 licensed (LICENSE) and maintained alongside a hosted commercial platform at app.strix.ai (README.md:96-107).

2. Tech Stack

From pyproject.toml:35-49:

Runtime-critical dependencies

Dependency	Role	Why it was picked
litellm 1.81.x	Unified LLM client — one API for OpenAI, Anthropic, Vertex, Bedrock, Azure, Ollama, OpenRouter, custom endpoints	Ships with >100 providers, handles streaming, retries, cost accounting, prompt caching; Strix supports BYO-model by design
textual ≥6.0	Reactive TUI framework for the interactive scan view	Only widely-used async Python TUI with good animation + mouse support
docker ≥7.1	Python Docker SDK — spins up the Kali sandbox per scan	Standard for programmatic container control
pydantic ≥2.11	Data validation for `AgentState`, tool args, config	Strict mypy-friendly models; FastAPI interop on the sandbox side
rich	Non-interactive CLI output (tables, panels, live stats)	Complements Textual; shared rendering primitives
xmltodict / defusedxml	Parse XML tool-call bodies, skill YAML frontmatter	Strix's tool-call syntax is XML, not JSON
tenacity	Retry/backoff primitives	Used inside the LLM wrapper
cvss	Compute CVSS scores on findings	Appears in the reporting tool
scrubadub	Scrub PII from telemetry payloads	Used in `strix/telemetry/utils.py`
traceloop-sdk + opentelemetry-exporter-otlp-proto-http	Observability — export agent traces via OTEL	Optional remote Traceloop backend; always-on local JSONL span export

Optional / sandbox-only

Installed inside the Docker image, not on the host (pyproject.toml:54-66):

fastapi + uvicorn — the tool server exposed on port 48081 of the sandbox
playwright — browser automation for XSS, auth flows, JS-heavy apps
openhands-aci — file-editor primitives (re-used from OpenHands)
ipython — persistent Python REPL sessions for custom exploit dev
libtmux + pyte — tmux-backed terminal sessions with ANSI-parsed replay
gql[requests] — GraphQL client used to drive Caido's proxy API
numpydoc — source-aware code parsing helper

Language & tooling

Python ≥3.12 required, strict typed throughout (pyproject.toml:96-148 enables strict mypy, pyright strict, full ruff rule set).
uv as package manager (migrated from Poetry — commit 38b2700).
hatchling build backend.
pyinstaller produces standalone binaries for mac arm64/x86, linux x86, windows (strix.spec, .github/workflows/build-release.yml). This is how curl install | bash delivers the CLI.

External binary tools (baked into the sandbox image)

All installed in containers/Dockerfile: nmap, nuclei, subfinder, naabu, httpx, katana, ffuf, sqlmap, semgrep, bandit, gitleaks, trufflehog, trivy, ast-grep (sg), tree-sitter parsers (Java/JS/Python/Go/ Bash/JSON/YAML/TS), wapiti, zaproxy, wafw00f, dirsearch, gospider, arjun, interactsh, jwt_tool, Caido v0.48.0, Playwright/Chromium. Base image is kalilinux/kali-rolling:latest.

3. Repository Layout (two-level)

strix/
├── strix/                 # Python package
│   ├── agents/            # BaseAgent + StrixAgent + shared state
│   ├── llm/               # litellm wrapper, memory compression, dedupe
│   ├── tools/             # Tool registry, executor, XML schemas
│   ├── runtime/           # Docker lifecycle + FastAPI tool server
│   ├── interface/         # CLI, Textual TUI, streaming parser
│   ├── skills/            # Markdown skill library (the "playbooks")
│   ├── config/            # Config class + ~/.strix persistence
│   ├── telemetry/         # PostHog + OTEL tracer, sanitization
│   └── utils/             # Resource-path helper (pip vs. pyinstaller)
├── containers/            # Dockerfile + entrypoint for the sandbox
├── benchmarks/            # README pointing at XBEN external suite
├── docs/                  # mintlify.json docs site
├── scripts/               # build.sh, install.sh, docker.sh
├── tests/                 # ~23 unit test files (no E2E)
├── strix.spec             # PyInstaller freeze spec
├── pyproject.toml         # strict ruff/mypy/pyright config
└── Makefile               # setup-dev, check-all, test, etc.

See 09_file_map.md for a detailed file→responsibility map.

4. Top-Level Mental Model

┌──────────────────── Host (user's machine) ─────────────────────────────┐
│                                                                        │
│  strix CLI  ──▶  StrixAgent ──▶  LLM (litellm → OpenAI/Anthropic/...)  │
│                     │                                                  │
│                     │ emits <function=tool><parameter=x>…</function>   │
│                     ▼                                                  │
│                  Tool Executor                                         │
│                     │                                                  │
│    ┌────────────────┴──────────────────┐                               │
│    │ local-only tools                  │ sandbox-execution tools       │
│    │ (agents_graph, think, finish,     │ (terminal, browser, python,   │
│    │  load_skill, web_search)          │  proxy, file_edit, notes,     │
│    │                                   │  reporting)                   │
│    └───────────────────────────────────┘              │                │
│                                                       │ HTTP POST      │
│                                                       │ /execute       │
└───────────────────────────────────────────────────────┼────────────────┘
                                                        │
┌────────────────── Docker sandbox (Kali) ──────────────▼────────────────┐
│                                                                        │
│  FastAPI tool_server (48081)  ───▶  local tool implementations         │
│                                        │                               │
│                                        ├── tmux terminal sessions      │
│                                        ├── Playwright browser          │
│                                        ├── Caido HTTPS proxy (48080)   │
│                                        ├── IPython REPL sessions       │
│                                        └── nmap/nuclei/sqlmap/…        │
│                                                                        │
│  /workspace = tar-uploaded target code                                 │
└────────────────────────────────────────────────────────────────────────┘

Key observations — covered in detail in later docs:

One sandbox per scan, shared by all agents (root + subagents). Isolation is per-scan, not per-agent. Subagents reuse the parent's sandbox_token and hit the same tool server.
Tool calls are XML, not JSON. Strix chose XML-style function tags (<function=tool_name><parameter=x>value</parameter></function>) because they stream better and are more robust to the model emitting partial output.
Skills are pure markdown. The "brains" of the system lives in strix/skills/*.md — scan methodologies, vulnerability playbooks, tool usage guides. They're loaded into the system prompt at startup (based on scan mode) and optionally added at runtime via the load_skill tool (capped at 5 per agent).
Multi-agent coordination is built-in. create_agent, agent_finish, send_message_to_agent, wait_for_message, view_agent_graph are first-class tools. Root agents orchestrate; subagents specialize.

5. What's Interesting / Clever

Short preview; each appears again in later docs with file:line evidence.

Markdown-as-prompt-library. Instead of embedding domain knowledge in code, Strix ships 30+ markdown files that function as dynamic prompt modules. Contributors can add skills without touching Python. This is the single biggest architectural bet.
XML tool-call format with a streaming-aware parser (strix/interface/streaming_parser.py) that renders partial tool calls as the LLM emits them.
Two-tier refusal override in the system prompt — explicit clauses that preempt the model's built-in safety hedging for in-scope authorized testing (system_prompt.jinja:65-76). Risky but necessary for the use case.
LLM-based dedupe of findings. Rather than hashing reports, a separate LLM call judges whether two vulnerability reports are the same root cause (strix/llm/dedupe.py).
LLM-based memory compression. At 90k tokens, older messages get summarized by a cheaper LLM pass, preserving vulns/credentials/payloads (strix/llm/memory_compressor.py).
Kali + Caido + Playwright + tree-sitter + semgrep in one image — a heavy (~12-15GB) but comprehensive toolbelt. The entrypoint wires system- wide http_proxy/https_proxy through Caido so every tool is auto-intercepted.
Scope enforcement via jinja-injected authorized-targets block — targets come from the hosting platform, not user chat, so the agent cannot be social-engineered out of scope.
Screenshot-as-tool-result — browser tool results embed base64 screenshots that get forwarded as vision messages to the LLM (strix/tools/executor.py:227-256).