Config, Telemetry, Packaging
The infrastructure around the agent loop: how settings are resolved, what data leaves the machine, how the CLI is built and shipped, and what's tested.
1. Config (strix/config/config.py)
Centralized Config class with class-level attributes. Three-level
precedence:
- Env vars —
os.environ(highest). ~/.strix/cli-config.json— persisted from prior sessions.- Class defaults.
1.1 Tracked Settings
The _tracked_names() helper (config.py:59-69) auto-discovers every
lowercase string-or-None attribute — so adding a setting is as easy as
adding a class attribute:
strix_llm: str | None = None
strix_reasoning_effort: str = "high"
strix_llm_max_retries: str = "5"
strix_memory_compressor_timeout: str = "30"
llm_timeout: str = "300"
strix_telemetry: str | None = None
strix_otel_telemetry: str | None = None
strix_posthog_telemetry: str | None = None
strix_sandbox_execution_timeout: str = "120"
strix_sandbox_connect_timeout: str = "10"
strix_disable_browser: str | None = None
strix_image: str | None = None
strix_runtime_backend: str | None = None
perplexity_api_key: str | None = None
traceloop_base_url / traceloop_api_key / traceloop_headers
…1.2 resolve_llm_config (config.py:190-216)
Special-case: model names starting with strix/ (e.g., strix/claude)
auto-pick https://models.strix.ai/api/v1 as the base URL. Other
providers fall back to provider-specific base-URL settings in order:
llm_api_base, openai_api_base, litellm_base_url, ollama_api_base.
1.3 Load / Save / Apply
load(:102-111): reads~/.strix/cli-config.json; silently skips on missing/corrupt.save(:114-124): writes JSON, attemptschmod 0600on Unix.apply_saved_config(:127-154): fills in env vars from saved config only if they aren't already set and haven't been explicitly cleared (tombstone pattern).capture_and_persist(:157-179): snapshot current state → disk._llm_env_changed(:72-83): detects when env-vars diverge from saved config and clears stale saved values.
This gives the "remembers your API key across runs" UX without clobbering explicit overrides.
2. Telemetry
Two channels, both opt-out:
2.1 PostHog — Event Analytics (strix/telemetry/posthog.py)
HTTP POST to https://us.i.posthog.com with the public key
phc_7rO3XRuNT5sgSKAl6HDIrWdSGh1COzxw0vxVIAR6vVZ. Events:
| Event | Payload |
|---|---|
scan_started |
model, scan_mode, scan_type (whitebox/blackbox), interactive flag, instruction-presence, first_run marker (:76-94) |
finding_reported |
severity only (:97-104) |
scan_ended |
duration, vulnerability counts by severity, agent count, tool-execution count, token usage, cost (:107-130) |
error |
error_type, optional error_msg (:133-137) |
Base properties (always attached, :67-73): OS, architecture, Python
version, Strix version.
Session ID is random per run — not a persistent user identifier
(:18).
2.2 OpenTelemetry / Traceloop (strix/telemetry/tracer.py)
The Tracer class is initialized per run:
- Local JSONL export:
strix_runs/<run_id>/events.jsonl— always written when telemetry is on. Each span becomes a JSONL line (tracer.py:95-104). - Optional remote Traceloop export: if
TRACELOOP_BASE_URLandTRACELOOP_API_KEYare set, OTEL spans also stream to the configured collector (tracer.py:116-148). Custom headers viaTRACELOOP_HEADERS(JSON orkey=valuepairs).
Event types emitted (tracer.py:187-268):
run.started, run.configured, agent.created,
agent.status.updated, chat.message, tool.execution.started,
tool.execution.updated, finding.created, finding.reviewed,
run.completed.
Every event carries: trace_id, span_id, parent_span_id, actor
(agent_id/name, tool_name), payload, status, error.
2.3 Sanitization (strix/telemetry/utils.py)
TelemetrySanitizer uses scrubadub for PII + custom regex for
secrets/tokens.
- Regex detectors (
utils.py:27-39):- Key-based:
api_key,token,secret,password,authorization,cookie,credential,private_key. - Value-based:
bearer,sk-*, GitHub tokens (gh[pousr]_*), Slack tokens (xox*). - Screenshot keys: redacted entirely (key and value).
- Key-based:
- Recursive walk (
utils.py:71-103):- dict → sanitize by key + value
- list/tuple → element-wise
- string + sensitive-key hint → redacted
- scrubadub pass removes email/phone/SSN-like patterns.
- Placeholder cleanup strips scrubadub's
{{...}}markers.
2.4 Feature Flags (strix/telemetry/flags.py:1-24)
Three knobs (0/false/no/off disables):
STRIX_TELEMETRY— master kill switch.STRIX_OTEL_TELEMETRY— OTEL only.STRIX_POSTHOG_TELEMETRY— PostHog only.
The per-channel flags fall back to the master if unset.
2.5 OTEL Attribute Pruning (telemetry/utils.py:183-203)
To keep the local JSONL compact, large LLM payloads (llm.input,
llm.output, gen_ai.prompt.*, gen_ai.completion.*,
llm.input_messages.*, llm.output_messages.*) are dropped from
spans before export. The count of filtered attributes is stored in
strix.filtered_attributes_count so you can see that something was
dropped, just not what.
This is essential — traceloop instrumentation on litellm would otherwise record every full prompt + completion, which would bloat files and leak secrets past the sanitizer.
3. Resource Paths (strix/utils/resource_paths.py:1-14)
The helper that makes dual pip/pyinstaller deployment work:
def get_strix_resource_path(*parts: str) -> Path:
frozen_base = getattr(sys, "_MEIPASS", None) # PyInstaller temp dir
if frozen_base:
base = Path(frozen_base) / "strix"
if base.exists():
return base.joinpath(*parts)
# Development / pip-install mode
base = Path(__file__).resolve().parent.parent # repo_root/strix
return base.joinpath(*parts)Used to locate:
- Skill markdown files (
strix/skills/**/*.md) - Jinja templates (
strix/agents/**/*.jinja) - Tool schemas (
strix/tools/**/*_schema.xml) - TUI stylesheets (
*.tcss)
One code path; works whether Strix was pip-installed (files under
site-packages/) or pyinstaller-frozen (files under _MEIPASS/strix).
4. Packaging (strix.spec)
PyInstaller spec that produces standalone binaries.
4.1 Data Files (strix.spec:10-27)
Bundled alongside the binary:
strix/skills/**/*.mdstrix/agents/**/*.jinjastrix/tools/**/*.xmlstrix/interface/**/*.tcss- Textual's own assets, tiktoken data, litellm data
4.2 Hidden Imports (strix.spec:35-146)
Explicit list because pyinstaller can't statically detect dynamic
imports used by litellm for every provider — OpenAI, Anthropic, Vertex,
Bedrock, Azure plus Strix's own modules (agents, llm, runtime,
telemetry, tools, skills).
4.3 Exclusions (strix.spec:148-199)
Kept out of the frozen binary (reduces size + surface):
- Sandbox-only deps (Playwright, IPython, libtmux) — these live in the Kali image, not the host binary.
- Cloud SDKs (google-cloud, grpc).
- Dev tools (pytest, mypy, ruff, bandit).
- Unnecessary scientific stack (scipy, numpy, cv2, tkinter).
5. Tests (tests/)
~23 Python files across 9 subdirs. Coverage is uneven — focused on infrastructure, sparse on the agent core.
| Area | Files | Notes |
|---|---|---|
config/ |
1 | Config loading, LLM env change detection, Traceloop var persistence |
telemetry/ |
3 | Tracer JSONL output, sanitization, flags, OTEL span pruning |
tools/ |
5 | Skill loading, argument parsing, agent graph (whitebox mode), tool registration |
llm/ |
2 | LLM init side-effect tests, OTEL callback isolation |
interface/ |
1 | Git diff-scope resolution |
agents/ |
(empty) | No direct tests |
skills/ |
(empty) | No direct tests |
runtime/ |
(empty) | No direct tests |
Patterns
- pytest +
monkeypatchfor env var manipulation. tmp_pathfor isolated config / artifact testing._reset_tracer_globalsautouse fixture (tests/telemetry/test_tracer.py:21-90) wipes global tracer state between tests.- Telemetry tests load the emitted JSONL and assert both structure and absence of sensitive tokens.
- Tool tests use dummy LLM/Agent classes (
tests/tools/ test_load_skill_tool.py:1-60) to test skill loading without running agents. - LLM tests verify no global side-effects on
litellm.callbacks(tests/llm/test_llm_otel.py:8-16) — an important bug magnet when multiple agents share a process.
What's not tested:
- No end-to-end agent run.
- No LLM-response mocking for tool dispatch.
- No snapshot tests for the rendered system prompt.
- No Docker integration tests.
- No skill content validation (e.g., frontmatter presence).
The agent loop itself relies heavily on the XBEN benchmark suite (external repo) for regression testing.
6. Benchmarks (benchmarks/README.md)
Points at the usestrix/benchmarks repo. The README reports:
- XBEN — 104 CTF-style web security challenges.
- Success rate: 96% (100/104 solved).
- By difficulty: L1 45/45 (100%), L2 49/51 (96%), L3 6/8 (75%).
- Resource usage: ~19 min avg solve time, ~$3.37 per challenge, ~$337 total.
No in-repo harness — benchmarks run as a separate project.
7. CI / CD (.github/workflows/build-release.yml)
Triggered by:
- Git tags
refs/tags/v* - Manual workflow dispatch
Build matrix (4 platforms)
- macOS ARM64 (
macos-latest) - macOS x86_64 (
macos-15-intel) - Linux x86_64 (
ubuntu-latest) - Windows x86_64 (
windows-latest)
Steps
- checkout, setup Python 3.12, install
uv uv sync --frozenpyinstaller strix.spec --noconfirm- Extract version from
pyproject.toml - Create archive (
.tar.gz/.zip) - Attach to GitHub release
What's missing
No PR/push test workflow is committed. Linting/testing is expected to run via the pre-commit config and the Makefile during local dev. This is a notable gap — merges land without CI validation.
8. Makefile
All commands use uv run for reproducible execution:
| Target | Purpose |
|---|---|
setup-dev |
Install dev deps + pre-commit hooks |
install |
Production only (uv sync --no-dev) |
dev-install |
All deps (uv sync) |
format |
Ruff formatter |
lint |
Ruff + pylint |
type-check |
mypy + pyright (both strict) |
security |
Bandit audit |
check-all |
format + lint + type-check + security |
test / test-cov |
pytest +/- coverage |
pre-commit |
Run pre-commit on all files |
dev |
format + lint + type-check + test |
clean |
Remove caches, .pyc, htmlcov/ |
9. Scripts (scripts/)
build.sh— platform detection, PyInstaller invocation, archive creation, binary smoke-test (--help), output todist/release/strix-<version>-<platform>.<archive>.install.sh— pulls latest release from GitHub, unpacks to~/.strix/bin/, version validation. This is what thecurl -sSL https://strix.ai/install | bashone-liner runs.docker.sh— sandbox image management (pull / rebuild).
10. Observations
Good ideas:
- Config via class attributes with auto-discovery (
_tracked_names) — zero-boilerplate way to add settings; type-hintable. - Env/saved/default three-way precedence with change detection —
handles the common UX footgun where a user changes
STRIX_LLMbut their oldLLM_API_KEYsticks. - Two-channel telemetry (PostHog for product analytics + OTEL for deep traces) — they serve different audiences.
- Aggressive OTEL attribute pruning — without this, traceloop on litellm would capture every prompt, defeating sanitization.
- Single resource-path helper for pip vs. pyinstaller — avoids the typical "it works on my machine" packaging split.
- Layered sanitization (key-name regex + value-pattern regex + scrubadub) — defense in depth against leaking secrets to third parties.
Pitfalls:
- No end-to-end test coverage of the agent loop. Benchmarks catch behavioral regressions but tests don't — so subtle refactors can slip through.
- No PR-time CI — tests only run locally (via pre-commit + Makefile). A contributor can ship a broken refactor through a merge.
- PostHog public key hardcoded in source. Opt-out is clear, but the default is opt-in. First-run consent prompt would be friendlier.
- Traceloop export has no local queuing. If the endpoint is slow, spans could back up. Not a typical problem but worth knowing.
- No automated skill-file validation. Adding a skill with bad
frontmatter or a missing required section won't fail CI. A
lightweight
tests/skills/test_structure.pythat validates frontmatter and section headings would harden contributions.