Config, Telemetry, Packaging

The infrastructure around the agent loop: how settings are resolved, what data leaves the machine, how the CLI is built and shipped, and what's tested.

1. Config (`strix/config/config.py`)

Centralized Config class with class-level attributes. Three-level precedence:

Env vars — os.environ (highest).
~/.strix/cli-config.json — persisted from prior sessions.
Class defaults.

1.1 Tracked Settings

The _tracked_names() helper (config.py:59-69) auto-discovers every lowercase string-or-None attribute — so adding a setting is as easy as adding a class attribute:

strix_llm: str | None = None
strix_reasoning_effort: str = "high"
strix_llm_max_retries: str = "5"
strix_memory_compressor_timeout: str = "30"
llm_timeout: str = "300"
strix_telemetry: str | None = None
strix_otel_telemetry: str | None = None
strix_posthog_telemetry: str | None = None
strix_sandbox_execution_timeout: str = "120"
strix_sandbox_connect_timeout: str = "10"
strix_disable_browser: str | None = None
strix_image: str | None = None
strix_runtime_backend: str | None = None
perplexity_api_key: str | None = None
traceloop_base_url / traceloop_api_key / traceloop_headers
…

1.2 `resolve_llm_config` (`config.py:190-216`)

Special-case: model names starting with strix/ (e.g., strix/claude) auto-pick https://models.strix.ai/api/v1 as the base URL. Other providers fall back to provider-specific base-URL settings in order: llm_api_base, openai_api_base, litellm_base_url, ollama_api_base.

1.3 Load / Save / Apply

load (:102-111): reads ~/.strix/cli-config.json; silently skips on missing/corrupt.
save (:114-124): writes JSON, attempts chmod 0600 on Unix.
apply_saved_config (:127-154): fills in env vars from saved config only if they aren't already set and haven't been explicitly cleared (tombstone pattern).
capture_and_persist (:157-179): snapshot current state → disk.
_llm_env_changed (:72-83): detects when env-vars diverge from saved config and clears stale saved values.

This gives the "remembers your API key across runs" UX without clobbering explicit overrides.

2. Telemetry

Two channels, both opt-out:

2.1 PostHog — Event Analytics (`strix/telemetry/posthog.py`)

HTTP POST to https://us.i.posthog.com with the public key phc_7rO3XRuNT5sgSKAl6HDIrWdSGh1COzxw0vxVIAR6vVZ. Events:

Event	Payload
`scan_started`	model, scan_mode, scan_type (whitebox/blackbox), interactive flag, instruction-presence, first_run marker (`:76-94`)
`finding_reported`	severity only (`:97-104`)
`scan_ended`	duration, vulnerability counts by severity, agent count, tool-execution count, token usage, cost (`:107-130`)
`error`	error_type, optional error_msg (`:133-137`)

Base properties (always attached, :67-73): OS, architecture, Python version, Strix version.

Session ID is random per run — not a persistent user identifier (:18).

2.2 OpenTelemetry / Traceloop (`strix/telemetry/tracer.py`)

The Tracer class is initialized per run:

Local JSONL export: strix_runs/<run_id>/events.jsonl — always written when telemetry is on. Each span becomes a JSONL line (tracer.py:95-104).
Optional remote Traceloop export: if TRACELOOP_BASE_URL and TRACELOOP_API_KEY are set, OTEL spans also stream to the configured collector (tracer.py:116-148). Custom headers via TRACELOOP_HEADERS (JSON or key=value pairs).

Event types emitted (tracer.py:187-268):

run.started, run.configured, agent.created, agent.status.updated, chat.message, tool.execution.started, tool.execution.updated, finding.created, finding.reviewed, run.completed.

Every event carries: trace_id, span_id, parent_span_id, actor (agent_id/name, tool_name), payload, status, error.

2.3 Sanitization (`strix/telemetry/utils.py`)

TelemetrySanitizer uses scrubadub for PII + custom regex for secrets/tokens.

Regex detectors (utils.py:27-39):
- Key-based: api_key, token, secret, password, authorization, cookie, credential, private_key.
- Value-based: bearer, sk-*, GitHub tokens (gh[pousr]_*), Slack tokens (xox*).
- Screenshot keys: redacted entirely (key and value).
Recursive walk (utils.py:71-103):
- dict → sanitize by key + value
- list/tuple → element-wise
- string + sensitive-key hint → redacted
scrubadub pass removes email/phone/SSN-like patterns.
Placeholder cleanup strips scrubadub's {{...}} markers.

2.4 Feature Flags (`strix/telemetry/flags.py:1-24`)

Three knobs (0/false/no/off disables):

STRIX_TELEMETRY — master kill switch.
STRIX_OTEL_TELEMETRY — OTEL only.
STRIX_POSTHOG_TELEMETRY — PostHog only.

The per-channel flags fall back to the master if unset.

2.5 OTEL Attribute Pruning (`telemetry/utils.py:183-203`)

To keep the local JSONL compact, large LLM payloads (llm.input, llm.output, gen_ai.prompt.*, gen_ai.completion.*, llm.input_messages.*, llm.output_messages.*) are dropped from spans before export. The count of filtered attributes is stored in strix.filtered_attributes_count so you can see that something was dropped, just not what.

This is essential — traceloop instrumentation on litellm would otherwise record every full prompt + completion, which would bloat files and leak secrets past the sanitizer.

3. Resource Paths (`strix/utils/resource_paths.py:1-14`)

The helper that makes dual pip/pyinstaller deployment work:

def get_strix_resource_path(*parts: str) -> Path:
    frozen_base = getattr(sys, "_MEIPASS", None)  # PyInstaller temp dir
    if frozen_base:
        base = Path(frozen_base) / "strix"
        if base.exists():
            return base.joinpath(*parts)
    # Development / pip-install mode
    base = Path(__file__).resolve().parent.parent  # repo_root/strix
    return base.joinpath(*parts)

Used to locate:

Skill markdown files (strix/skills/**/*.md)
Jinja templates (strix/agents/**/*.jinja)
Tool schemas (strix/tools/**/*_schema.xml)
TUI stylesheets (*.tcss)

One code path; works whether Strix was pip-installed (files under site-packages/) or pyinstaller-frozen (files under _MEIPASS/strix).

4. Packaging (`strix.spec`)

PyInstaller spec that produces standalone binaries.

4.1 Data Files (`strix.spec:10-27`)

Bundled alongside the binary:

strix/skills/**/*.md
strix/agents/**/*.jinja
strix/tools/**/*.xml
strix/interface/**/*.tcss
Textual's own assets, tiktoken data, litellm data

4.2 Hidden Imports (`strix.spec:35-146`)

Explicit list because pyinstaller can't statically detect dynamic imports used by litellm for every provider — OpenAI, Anthropic, Vertex, Bedrock, Azure plus Strix's own modules (agents, llm, runtime, telemetry, tools, skills).

4.3 Exclusions (`strix.spec:148-199`)

Kept out of the frozen binary (reduces size + surface):

Sandbox-only deps (Playwright, IPython, libtmux) — these live in the Kali image, not the host binary.
Cloud SDKs (google-cloud, grpc).
Dev tools (pytest, mypy, ruff, bandit).
Unnecessary scientific stack (scipy, numpy, cv2, tkinter).

5. Tests (`tests/`)

~23 Python files across 9 subdirs. Coverage is uneven — focused on infrastructure, sparse on the agent core.

Area	Files	Notes
`config/`	1	Config loading, LLM env change detection, Traceloop var persistence
`telemetry/`	3	Tracer JSONL output, sanitization, flags, OTEL span pruning
`tools/`	5	Skill loading, argument parsing, agent graph (whitebox mode), tool registration
`llm/`	2	LLM init side-effect tests, OTEL callback isolation
`interface/`	1	Git diff-scope resolution
`agents/`	(empty)	No direct tests
`skills/`	(empty)	No direct tests
`runtime/`	(empty)	No direct tests

Patterns

pytest + monkeypatch for env var manipulation.
tmp_path for isolated config / artifact testing.
_reset_tracer_globals autouse fixture (tests/telemetry/test_tracer.py:21-90) wipes global tracer state between tests.
Telemetry tests load the emitted JSONL and assert both structure and absence of sensitive tokens.
Tool tests use dummy LLM/Agent classes (tests/tools/ test_load_skill_tool.py:1-60) to test skill loading without running agents.
LLM tests verify no global side-effects on litellm.callbacks (tests/llm/test_llm_otel.py:8-16) — an important bug magnet when multiple agents share a process.

What's not tested:

No end-to-end agent run.
No LLM-response mocking for tool dispatch.
No snapshot tests for the rendered system prompt.
No Docker integration tests.
No skill content validation (e.g., frontmatter presence).

The agent loop itself relies heavily on the XBEN benchmark suite (external repo) for regression testing.

6. Benchmarks (`benchmarks/README.md`)

Points at the usestrix/benchmarks repo. The README reports:

XBEN — 104 CTF-style web security challenges.
Success rate: 96% (100/104 solved).
By difficulty: L1 45/45 (100%), L2 49/51 (96%), L3 6/8 (75%).
Resource usage: ~19 min avg solve time, ~$3.37 per challenge, ~$337 total.

No in-repo harness — benchmarks run as a separate project.

7. CI / CD (`.github/workflows/build-release.yml`)

Triggered by:

Git tags refs/tags/v*
Manual workflow dispatch

Build matrix (4 platforms)

macOS ARM64 (macos-latest)
macOS x86_64 (macos-15-intel)
Linux x86_64 (ubuntu-latest)
Windows x86_64 (windows-latest)

Steps

checkout, setup Python 3.12, install uv
uv sync --frozen
pyinstaller strix.spec --noconfirm
Extract version from pyproject.toml
Create archive (.tar.gz / .zip)
Attach to GitHub release

What's missing

No PR/push test workflow is committed. Linting/testing is expected to run via the pre-commit config and the Makefile during local dev. This is a notable gap — merges land without CI validation.

8. Makefile

All commands use uv run for reproducible execution:

Target	Purpose
`setup-dev`	Install dev deps + pre-commit hooks
`install`	Production only (`uv sync --no-dev`)
`dev-install`	All deps (`uv sync`)
`format`	Ruff formatter
`lint`	Ruff + pylint
`type-check`	mypy + pyright (both strict)
`security`	Bandit audit
`check-all`	format + lint + type-check + security
`test` / `test-cov`	pytest +/- coverage
`pre-commit`	Run pre-commit on all files
`dev`	format + lint + type-check + test
`clean`	Remove caches, `.pyc`, `htmlcov/`

9. Scripts (`scripts/`)

build.sh — platform detection, PyInstaller invocation, archive creation, binary smoke-test (--help), output to dist/release/strix-<version>-<platform>.<archive>.
install.sh — pulls latest release from GitHub, unpacks to ~/.strix/bin/, version validation. This is what the curl -sSL https://strix.ai/install | bash one-liner runs.
docker.sh — sandbox image management (pull / rebuild).

10. Observations

Good ideas:

Config via class attributes with auto-discovery (_tracked_names) — zero-boilerplate way to add settings; type-hintable.
Env/saved/default three-way precedence with change detection — handles the common UX footgun where a user changes STRIX_LLM but their old LLM_API_KEY sticks.
Two-channel telemetry (PostHog for product analytics + OTEL for deep traces) — they serve different audiences.
Aggressive OTEL attribute pruning — without this, traceloop on litellm would capture every prompt, defeating sanitization.
Single resource-path helper for pip vs. pyinstaller — avoids the typical "it works on my machine" packaging split.
Layered sanitization (key-name regex + value-pattern regex + scrubadub) — defense in depth against leaking secrets to third parties.

Pitfalls:

No end-to-end test coverage of the agent loop. Benchmarks catch behavioral regressions but tests don't — so subtle refactors can slip through.
No PR-time CI — tests only run locally (via pre-commit + Makefile). A contributor can ship a broken refactor through a merge.
PostHog public key hardcoded in source. Opt-out is clear, but the default is opt-in. First-run consent prompt would be friendlier.
Traceloop export has no local queuing. If the endpoint is slow, spans could back up. Not a typical problem but worth knowing.
No automated skill-file validation. Adding a skill with bad frontmatter or a missing required section won't fail CI. A lightweight tests/skills/test_structure.py that validates frontmatter and section headings would harden contributions.