CodeDocs Vault

Strix — Repository Analysis

Generated walkthrough of usestrix/strix (commit 15c9571) — the open-source "AI hacker" agent framework. All line numbers below correspond to the tree at the time of writing.

This directory contains a layered tour of the codebase. Read in order, or jump to a specific subsystem.

# File Scope
00 00_overview.md Purpose, problem, users, tech stack (this file)
01 01_architecture.md High-level architecture, components, data flow
02 02_agent_and_llm.md Agent loop, state, LLM wrapper, memory compression, dedupe
03 03_tools.md Tool registry, executor, XML schema contract, per-tool map
04 04_runtime_sandbox.md Docker sandbox, FastAPI tool server, transport, isolation
05 05_skills_and_prompts.md Skills taxonomy, scan modes, system prompt, prompt engineering
06 06_interface_cli_tui.md CLI entry point, Textual TUI, streaming parser, artifacts
07 07_config_telemetry_packaging.md Config, telemetry, packaging, tests, CI
08 08_llm_leverage_patterns.md Synthesis: patterns and techniques for leveraging LLMs
09 09_file_map.md File-to-responsibility map for quick navigation

1. Purpose & Problem

What it is. Strix is an autonomous multi-agent framework that behaves as an "AI pentester" — it takes a target (GitHub repo, local codebase, URL, domain, or IP), spins up a sandboxed Kali container full of real offensive tools, and iteratively probes the target through an LLM-driven agent loop until it has collected validated vulnerability findings with working proof-of-concepts (README.md:43-52).

What it solves. Two adjacent pain points:

  1. Manual pentesting is slow and expensive. A real pentest takes weeks. Strix aims to reduce that to hours (README.md:67-69).
  2. Static analysis is noisy. SAST tools generate false positives because they can't run the code. Strix validates findings through actual exploitation, not just pattern matching (README.md:50-52, 82-86).

The headline metric: 96% solve rate on the XBEN CTF benchmark (100/104 challenges), avg ~19 min / ~$3.37 per challenge (benchmarks/README.md).

Who it's for.

The project is Apache-2.0 licensed (LICENSE) and maintained alongside a hosted commercial platform at app.strix.ai (README.md:96-107).


2. Tech Stack

From pyproject.toml:35-49:

Runtime-critical dependencies

Dependency Role Why it was picked
litellm 1.81.x Unified LLM client — one API for OpenAI, Anthropic, Vertex, Bedrock, Azure, Ollama, OpenRouter, custom endpoints Ships with >100 providers, handles streaming, retries, cost accounting, prompt caching; Strix supports BYO-model by design
textual ≥6.0 Reactive TUI framework for the interactive scan view Only widely-used async Python TUI with good animation + mouse support
docker ≥7.1 Python Docker SDK — spins up the Kali sandbox per scan Standard for programmatic container control
pydantic ≥2.11 Data validation for AgentState, tool args, config Strict mypy-friendly models; FastAPI interop on the sandbox side
rich Non-interactive CLI output (tables, panels, live stats) Complements Textual; shared rendering primitives
xmltodict / defusedxml Parse XML tool-call bodies, skill YAML frontmatter Strix's tool-call syntax is XML, not JSON
tenacity Retry/backoff primitives Used inside the LLM wrapper
cvss Compute CVSS scores on findings Appears in the reporting tool
scrubadub Scrub PII from telemetry payloads Used in strix/telemetry/utils.py
traceloop-sdk + opentelemetry-exporter-otlp-proto-http Observability — export agent traces via OTEL Optional remote Traceloop backend; always-on local JSONL span export

Optional / sandbox-only

Installed inside the Docker image, not on the host (pyproject.toml:54-66):

Language & tooling

External binary tools (baked into the sandbox image)

All installed in containers/Dockerfile: nmap, nuclei, subfinder, naabu, httpx, katana, ffuf, sqlmap, semgrep, bandit, gitleaks, trufflehog, trivy, ast-grep (sg), tree-sitter parsers (Java/JS/Python/Go/ Bash/JSON/YAML/TS), wapiti, zaproxy, wafw00f, dirsearch, gospider, arjun, interactsh, jwt_tool, Caido v0.48.0, Playwright/Chromium. Base image is kalilinux/kali-rolling:latest.


3. Repository Layout (two-level)

strix/
├── strix/                 # Python package
│   ├── agents/            # BaseAgent + StrixAgent + shared state
│   ├── llm/               # litellm wrapper, memory compression, dedupe
│   ├── tools/             # Tool registry, executor, XML schemas
│   ├── runtime/           # Docker lifecycle + FastAPI tool server
│   ├── interface/         # CLI, Textual TUI, streaming parser
│   ├── skills/            # Markdown skill library (the "playbooks")
│   ├── config/            # Config class + ~/.strix persistence
│   ├── telemetry/         # PostHog + OTEL tracer, sanitization
│   └── utils/             # Resource-path helper (pip vs. pyinstaller)
├── containers/            # Dockerfile + entrypoint for the sandbox
├── benchmarks/            # README pointing at XBEN external suite
├── docs/                  # mintlify.json docs site
├── scripts/               # build.sh, install.sh, docker.sh
├── tests/                 # ~23 unit test files (no E2E)
├── strix.spec             # PyInstaller freeze spec
├── pyproject.toml         # strict ruff/mypy/pyright config
└── Makefile               # setup-dev, check-all, test, etc.

See 09_file_map.md for a detailed file→responsibility map.


4. Top-Level Mental Model

┌──────────────────── Host (user's machine) ─────────────────────────────┐
│                                                                        │
│  strix CLI  ──▶  StrixAgent ──▶  LLM (litellm → OpenAI/Anthropic/...)  │
│                     │                                                  │
│                     │ emits <function=tool><parameter=x>…</function>   │
│                     ▼                                                  │
│                  Tool Executor                                         │
│                     │                                                  │
│    ┌────────────────┴──────────────────┐                               │
│    │ local-only tools                  │ sandbox-execution tools       │
│    │ (agents_graph, think, finish,     │ (terminal, browser, python,   │
│    │  load_skill, web_search)          │  proxy, file_edit, notes,     │
│    │                                   │  reporting)                   │
│    └───────────────────────────────────┘              │                │
│                                                       │ HTTP POST      │
│                                                       │ /execute       │
└───────────────────────────────────────────────────────┼────────────────┘
                                                        │
┌────────────────── Docker sandbox (Kali) ──────────────▼────────────────┐
│                                                                        │
│  FastAPI tool_server (48081)  ───▶  local tool implementations         │
│                                        │                               │
│                                        ├── tmux terminal sessions      │
│                                        ├── Playwright browser          │
│                                        ├── Caido HTTPS proxy (48080)   │
│                                        ├── IPython REPL sessions       │
│                                        └── nmap/nuclei/sqlmap/…        │
│                                                                        │
│  /workspace = tar-uploaded target code                                 │
└────────────────────────────────────────────────────────────────────────┘

Key observations — covered in detail in later docs:


5. What's Interesting / Clever

Short preview; each appears again in later docs with file:line evidence.

  1. Markdown-as-prompt-library. Instead of embedding domain knowledge in code, Strix ships 30+ markdown files that function as dynamic prompt modules. Contributors can add skills without touching Python. This is the single biggest architectural bet.
  2. XML tool-call format with a streaming-aware parser (strix/interface/streaming_parser.py) that renders partial tool calls as the LLM emits them.
  3. Two-tier refusal override in the system prompt — explicit clauses that preempt the model's built-in safety hedging for in-scope authorized testing (system_prompt.jinja:65-76). Risky but necessary for the use case.
  4. LLM-based dedupe of findings. Rather than hashing reports, a separate LLM call judges whether two vulnerability reports are the same root cause (strix/llm/dedupe.py).
  5. LLM-based memory compression. At 90k tokens, older messages get summarized by a cheaper LLM pass, preserving vulns/credentials/payloads (strix/llm/memory_compressor.py).
  6. Kali + Caido + Playwright + tree-sitter + semgrep in one image — a heavy (~12-15GB) but comprehensive toolbelt. The entrypoint wires system- wide http_proxy/https_proxy through Caido so every tool is auto-intercepted.
  7. Scope enforcement via jinja-injected authorized-targets block — targets come from the hosting platform, not user chat, so the agent cannot be social-engineered out of scope.
  8. Screenshot-as-tool-result — browser tool results embed base64 screenshots that get forwarded as vision messages to the LLM (strix/tools/executor.py:227-256).