Strix — Repository Analysis
Generated walkthrough of
usestrix/strix(commit15c9571) — the open-source "AI hacker" agent framework. All line numbers below correspond to the tree at the time of writing.
This directory contains a layered tour of the codebase. Read in order, or jump to a specific subsystem.
| # | File | Scope |
|---|---|---|
| 00 | 00_overview.md |
Purpose, problem, users, tech stack (this file) |
| 01 | 01_architecture.md |
High-level architecture, components, data flow |
| 02 | 02_agent_and_llm.md |
Agent loop, state, LLM wrapper, memory compression, dedupe |
| 03 | 03_tools.md |
Tool registry, executor, XML schema contract, per-tool map |
| 04 | 04_runtime_sandbox.md |
Docker sandbox, FastAPI tool server, transport, isolation |
| 05 | 05_skills_and_prompts.md |
Skills taxonomy, scan modes, system prompt, prompt engineering |
| 06 | 06_interface_cli_tui.md |
CLI entry point, Textual TUI, streaming parser, artifacts |
| 07 | 07_config_telemetry_packaging.md |
Config, telemetry, packaging, tests, CI |
| 08 | 08_llm_leverage_patterns.md |
Synthesis: patterns and techniques for leveraging LLMs |
| 09 | 09_file_map.md |
File-to-responsibility map for quick navigation |
1. Purpose & Problem
What it is. Strix is an autonomous multi-agent framework that behaves as an
"AI pentester" — it takes a target (GitHub repo, local codebase, URL, domain,
or IP), spins up a sandboxed Kali container full of real offensive tools, and
iteratively probes the target through an LLM-driven agent loop until it has
collected validated vulnerability findings with working proof-of-concepts
(README.md:43-52).
What it solves. Two adjacent pain points:
- Manual pentesting is slow and expensive. A real pentest takes weeks.
Strix aims to reduce that to hours (
README.md:67-69). - Static analysis is noisy. SAST tools generate false positives because
they can't run the code. Strix validates findings through actual
exploitation, not just pattern matching (
README.md:50-52, 82-86).
The headline metric: 96% solve rate on the XBEN CTF benchmark (100/104
challenges), avg ~19 min / ~$3.37 per challenge (benchmarks/README.md).
Who it's for.
- Application-security teams needing continuous testing
- Developers integrating security checks into CI/CD (GitHub Actions first-class,
README.md:193-224) - Bug-bounty hunters automating recon + PoC generation
- Security researchers doing rapid assessment
The project is Apache-2.0 licensed (LICENSE) and maintained alongside a hosted
commercial platform at app.strix.ai (README.md:96-107).
2. Tech Stack
From pyproject.toml:35-49:
Runtime-critical dependencies
| Dependency | Role | Why it was picked |
|---|---|---|
| litellm 1.81.x | Unified LLM client — one API for OpenAI, Anthropic, Vertex, Bedrock, Azure, Ollama, OpenRouter, custom endpoints | Ships with >100 providers, handles streaming, retries, cost accounting, prompt caching; Strix supports BYO-model by design |
| textual ≥6.0 | Reactive TUI framework for the interactive scan view | Only widely-used async Python TUI with good animation + mouse support |
| docker ≥7.1 | Python Docker SDK — spins up the Kali sandbox per scan | Standard for programmatic container control |
| pydantic ≥2.11 | Data validation for AgentState, tool args, config |
Strict mypy-friendly models; FastAPI interop on the sandbox side |
| rich | Non-interactive CLI output (tables, panels, live stats) | Complements Textual; shared rendering primitives |
| xmltodict / defusedxml | Parse XML tool-call bodies, skill YAML frontmatter | Strix's tool-call syntax is XML, not JSON |
| tenacity | Retry/backoff primitives | Used inside the LLM wrapper |
| cvss | Compute CVSS scores on findings | Appears in the reporting tool |
| scrubadub | Scrub PII from telemetry payloads | Used in strix/telemetry/utils.py |
| traceloop-sdk + opentelemetry-exporter-otlp-proto-http | Observability — export agent traces via OTEL | Optional remote Traceloop backend; always-on local JSONL span export |
Optional / sandbox-only
Installed inside the Docker image, not on the host (pyproject.toml:54-66):
- fastapi + uvicorn — the tool server exposed on port 48081 of the sandbox
- playwright — browser automation for XSS, auth flows, JS-heavy apps
- openhands-aci — file-editor primitives (re-used from OpenHands)
- ipython — persistent Python REPL sessions for custom exploit dev
- libtmux + pyte — tmux-backed terminal sessions with ANSI-parsed replay
- gql[requests] — GraphQL client used to drive Caido's proxy API
- numpydoc — source-aware code parsing helper
Language & tooling
- Python ≥3.12 required, strict typed throughout (
pyproject.toml:96-148enables strict mypy, pyright strict, full ruff rule set). - uv as package manager (migrated from Poetry — commit
38b2700). - hatchling build backend.
- pyinstaller produces standalone binaries for mac arm64/x86, linux x86,
windows (
strix.spec,.github/workflows/build-release.yml). This is howcurl install | bashdelivers the CLI.
External binary tools (baked into the sandbox image)
All installed in containers/Dockerfile: nmap, nuclei, subfinder,
naabu, httpx, katana, ffuf, sqlmap, semgrep, bandit, gitleaks,
trufflehog, trivy, ast-grep (sg), tree-sitter parsers (Java/JS/Python/Go/
Bash/JSON/YAML/TS), wapiti, zaproxy, wafw00f, dirsearch, gospider,
arjun, interactsh, jwt_tool, Caido v0.48.0, Playwright/Chromium. Base
image is kalilinux/kali-rolling:latest.
3. Repository Layout (two-level)
strix/
├── strix/ # Python package
│ ├── agents/ # BaseAgent + StrixAgent + shared state
│ ├── llm/ # litellm wrapper, memory compression, dedupe
│ ├── tools/ # Tool registry, executor, XML schemas
│ ├── runtime/ # Docker lifecycle + FastAPI tool server
│ ├── interface/ # CLI, Textual TUI, streaming parser
│ ├── skills/ # Markdown skill library (the "playbooks")
│ ├── config/ # Config class + ~/.strix persistence
│ ├── telemetry/ # PostHog + OTEL tracer, sanitization
│ └── utils/ # Resource-path helper (pip vs. pyinstaller)
├── containers/ # Dockerfile + entrypoint for the sandbox
├── benchmarks/ # README pointing at XBEN external suite
├── docs/ # mintlify.json docs site
├── scripts/ # build.sh, install.sh, docker.sh
├── tests/ # ~23 unit test files (no E2E)
├── strix.spec # PyInstaller freeze spec
├── pyproject.toml # strict ruff/mypy/pyright config
└── Makefile # setup-dev, check-all, test, etc.
See 09_file_map.md for a detailed file→responsibility map.
4. Top-Level Mental Model
┌──────────────────── Host (user's machine) ─────────────────────────────┐
│ │
│ strix CLI ──▶ StrixAgent ──▶ LLM (litellm → OpenAI/Anthropic/...) │
│ │ │
│ │ emits <function=tool><parameter=x>…</function> │
│ ▼ │
│ Tool Executor │
│ │ │
│ ┌────────────────┴──────────────────┐ │
│ │ local-only tools │ sandbox-execution tools │
│ │ (agents_graph, think, finish, │ (terminal, browser, python, │
│ │ load_skill, web_search) │ proxy, file_edit, notes, │
│ │ │ reporting) │
│ └───────────────────────────────────┘ │ │
│ │ HTTP POST │
│ │ /execute │
└───────────────────────────────────────────────────────┼────────────────┘
│
┌────────────────── Docker sandbox (Kali) ──────────────▼────────────────┐
│ │
│ FastAPI tool_server (48081) ───▶ local tool implementations │
│ │ │
│ ├── tmux terminal sessions │
│ ├── Playwright browser │
│ ├── Caido HTTPS proxy (48080) │
│ ├── IPython REPL sessions │
│ └── nmap/nuclei/sqlmap/… │
│ │
│ /workspace = tar-uploaded target code │
└────────────────────────────────────────────────────────────────────────┘
Key observations — covered in detail in later docs:
- One sandbox per scan, shared by all agents (root + subagents). Isolation
is per-scan, not per-agent. Subagents reuse the parent's
sandbox_tokenand hit the same tool server. - Tool calls are XML, not JSON. Strix chose XML-style function tags
(
<function=tool_name><parameter=x>value</parameter></function>) because they stream better and are more robust to the model emitting partial output. - Skills are pure markdown. The "brains" of the system lives in
strix/skills/*.md— scan methodologies, vulnerability playbooks, tool usage guides. They're loaded into the system prompt at startup (based on scan mode) and optionally added at runtime via theload_skilltool (capped at 5 per agent). - Multi-agent coordination is built-in.
create_agent,agent_finish,send_message_to_agent,wait_for_message,view_agent_graphare first-class tools. Root agents orchestrate; subagents specialize.
5. What's Interesting / Clever
Short preview; each appears again in later docs with file:line evidence.
- Markdown-as-prompt-library. Instead of embedding domain knowledge in code, Strix ships 30+ markdown files that function as dynamic prompt modules. Contributors can add skills without touching Python. This is the single biggest architectural bet.
- XML tool-call format with a streaming-aware parser
(
strix/interface/streaming_parser.py) that renders partial tool calls as the LLM emits them. - Two-tier refusal override in the system prompt — explicit clauses that
preempt the model's built-in safety hedging for in-scope authorized
testing (
system_prompt.jinja:65-76). Risky but necessary for the use case. - LLM-based dedupe of findings. Rather than hashing reports, a separate
LLM call judges whether two vulnerability reports are the same root cause
(
strix/llm/dedupe.py). - LLM-based memory compression. At 90k tokens, older messages get
summarized by a cheaper LLM pass, preserving vulns/credentials/payloads
(
strix/llm/memory_compressor.py). - Kali + Caido + Playwright + tree-sitter + semgrep in one image — a
heavy (~12-15GB) but comprehensive toolbelt. The entrypoint wires system-
wide
http_proxy/https_proxythrough Caido so every tool is auto-intercepted. - Scope enforcement via jinja-injected authorized-targets block — targets come from the hosting platform, not user chat, so the agent cannot be social-engineered out of scope.
- Screenshot-as-tool-result — browser tool results embed base64
screenshots that get forwarded as vision messages to the LLM
(
strix/tools/executor.py:227-256).