Tools Subsystem
Tools are the LLM's effector system. Strix ships ~15 tool modules covering shells, browsers, HTTP proxy, Python runtime, file editing, notes, reporting, coordination, thinking, finish, todo, skill loading, and web search. This doc covers the common plumbing and then each module in turn.
1. Core Plumbing
1.1 Registry (strix/tools/registry.py)
Tools register at import time via a decorator:
@register_tool(sandbox_execution=True)
async def terminal_execute(agent_state, command, timeout=60, ...):
...The decorator (registry.py:190-251) does four things:
- Capability gating (
:175-187). Skips registration if:- tool requires browser and
STRIX_DISABLE_BROWSERis set, - tool is
web_searchandPERPLEXITY_API_KEYis missing, - running in "sandbox mode" (the FastAPI side) vs. host mode changes what's available.
- tool requires browser and
- Schema loading — for
foo_actions.pyit readsfoo_actions_schema.xmlfrom the same directory (_load_xml_schema,:47-88). - Parameter parsing — extracts
<parameter name="..." required="...">entries for runtime validation (_parse_param_schema,:90-115). - Indexing — appends to the global
toolslist and two lookup dicts:_tools_by_name,_tool_param_schemas(:239-240).
A ContextVar-based current-agent tracker lives at
strix/tools/context.py:1-13 for tools that need to find "which agent
called me" across asyncio.to_thread boundaries. Most tools don't use it
— they receive agent_state as a parameter when the executor detects the
parameter in their signature (registry.py:265-270).
1.2 Executor (strix/tools/executor.py)
The entry point for every tool call. execute_tool() (:29-115):
- Route decision —
should_execute_in_sandbox(tool_name)(:273-277). If the tool'ssandbox_execution=Trueand we're not already inside the sandbox (STRIX_SANDBOX_MODEenv flag), go remote. - Local path (
:101-115)- Look up function from registry.
convert_arguments(fn, kwargs)— type-coerce strings from the LLM into the declared Python types (handlesUnion,Optional, JSON for lists/dicts, literal fallbacks).argument_parser.py:15-47.- Inject
agent_stateif the function signature asks for it (needs_agent_state,registry.py:265-270). - Await if coroutine, call otherwise.
- Remote path (
:39-98)- httpx.AsyncClient POST to
{sandbox_url}/executewith JSON{"agent_id", "tool_name", "kwargs"}. Authorization: Bearer {agent_state.sandbox_token}.- 150s total timeout (120s server timeout + 30s buffer), 10s connect timeout.
- On
{"error": ...}raiseRuntimeError.
- httpx.AsyncClient POST to
- Result formatting —
_format_tool_result(:227-256)- If result is
{"screenshot": "<base64>", …}, extract the image into a vision content block and strip from the text result. - Truncate results >10KB to first 4KB + ellipsis + last 4KB.
- Wrap in
<tool_result><tool_name>X</tool_name><result>Y</result></tool_result>.
- If result is
- History update — append to
conversation_historyas a user-role message; if images were extracted, the message is multi-part (:313-342).
1.3 Argument Parser (strix/tools/argument_parser.py:15-47)
Strings from the LLM coerced to declared types:
int/float—int(v)/float(v).bool—v.lower() in {"true","1","yes"}.list/dict—json.loads(v)first, thenast.literal_evalfallback.Union[str, int]— tries str first, falls back to int parsing.Optional[X]— empty string →None.
Not a full schema validator — it relies on the LLM following the XML
schema + its own system-prompt instructions. The validation that does
exist is at executor.py:130-162: checks required params are present
and no unknown ones, returning human-readable error messages with schema
hints back to the LLM.
1.4 The Schema Contract
Every tool module foo_actions.py has a sibling foo_actions_schema.xml.
Shape:
<tools>
<tool name="terminal_execute">
<description>Execute a shell command in a persistent tmux session.</description>
<parameters>
<parameter name="command" type="string" required="true">
<description>The shell command to run.</description>
</parameter>
<parameter name="timeout" type="integer" required="false">
<description>Max seconds to wait before returning.</description>
</parameter>
</parameters>
<returns type="Dict[str, Any]">…</returns>
<examples>…</examples>
</tool>
</tools>The registry assembles these into the tools prompt via
get_tools_prompt() (registry.py:280-300):
- Groups by module (
agents_graph,browser,terminal,proxy, ...). - Wraps each module's tools in a module tag:
<agents_graph_tools>…</agents_graph_tools>. - Concatenates everything, injected into the system prompt via the jinja
get_tools_promptcallback inllm.py:100-106.
This means the LLM sees the full XML spec of every available tool on
every turn (modulo prompt caching — Anthropic ephemeral blocks let
providers reuse the cached system prompt).
2. Tool Modules
Each has *_actions.py (implementation) and *_actions_schema.xml
(LLM-facing spec). sandbox_execution flag in the decorator determines
routing.
2.1 agents_graph — Multi-agent coordination (local)
strix/tools/agents_graph/agents_graph_actions.py. Local-only — needs
direct access to the module-level agent graph.
| Action | Purpose |
|---|---|
create_agent(task, name, inherit_context, skills) |
Spawn a subagent in a background thread with a focused task and up to 5 skills. Inherits parent's sandbox handle. (:384-492) |
agent_finish(result_summary, findings, …) |
Subagent signals completion; result propagates to parent via inter-agent message. (:567-685) |
send_message_to_agent(to, content, priority) |
Push a message into another agent's mailbox. |
wait_for_message(timeout) |
Idle until the mailbox has something (interactive-mode idle). |
view_agent_graph() |
Dump the full tree with statuses — used by root agents to decide when to close out. |
2.2 browser — Playwright automation (sandbox)
strix/tools/browser/. Single tool browser_action(action, url, ...) with
~22 sub-actions: launch, goto, click, type, fill, scroll,
execute_js, view_source, screenshot, save_pdf, wait_for,
new_tab, switch_tab, close_tab, evaluate, intercept_requests, …
- Persistent multi-tab browser instance kept across calls
(
browser_instance.py,tab_manager.py). - Every action captures a screenshot into the result — the executor then promotes it to a vision message.
- Runs Chromium pre-installed in the image; NSS certs from Caido injected
at entrypoint so HTTPS is MITM-intercepted by default
(
docker-entrypoint.sh:149-152).
2.3 terminal — tmux sessions (sandbox)
strix/tools/terminal/. Tool: terminal_execute(command, is_input, timeout, terminal_id).
- Backed by tmux; state (CWD, env, running jobs) persists across calls.
is_input=truesends text into a running foreground process (interacting with sqlmap prompts, etc.).- Special key syntax —
C-c,C-d,Enter,F1handled withoutis_inputflag. timeoutup to 60s; command keeps running in the background if it exceeds timeout, so the agent can poll again.- Terminal output is ANSI-parsed server-side by
pyte(dep) to feed the TUI a clean replay.
2.4 proxy — Caido HTTP proxy (sandbox)
strix/tools/proxy/. Interacts with Caido's GraphQL API (port 48080).
| Tool | Purpose |
|---|---|
list_requests |
HttpQL-filtered request log with pagination |
| `view_request(id, part="request" | "response")` |
send_request / repeat_request |
Craft or replay requests |
scope_rules |
Manage Caido scope for noise reduction |
All system traffic (curl, httpx, browser) flows through Caido because the
entrypoint sets http_proxy/https_proxy system-wide
(docker-entrypoint.sh:115-144).
2.5 python — persistent IPython REPLs (sandbox)
strix/tools/python/. python_action(action, code, session_id):
new_session→ fresh IPython kernelexecute→ run code in that kernel (state persists: variables, imports)close→ kill the kernel
Pre-imports proxy helpers so agents can analyze/replay captured traffic from inside Python.
2.6 file_edit — OpenHands ACI (sandbox)
strix/tools/file_edit/. Three tools:
| Tool | Purpose |
|---|---|
str_replace_editor(command, path, ...) |
view, create, str_replace, insert, undo_edit |
list_files(path, recursive) |
Directory listing |
search_files(path, regex, file_pattern) |
ripgrep-backed search |
Reuses the editor primitives from the OpenHands project (openhands-aci
sandbox-only dep).
2.7 notes — agent scratchpad (sandbox)
strix/tools/notes/. CRUD on categorized notes, persisted to a JSONL in
the run directory. Categories: general, findings, methodology,
questions, plan, wiki.
The wiki category is the shared repo memory that the
source_aware_whitebox skill mandates — a single note per repository
that every subagent reads-then-updates to share architecture/routing/sink
maps.
2.8 reporting — Vulnerability reports (sandbox)
strix/tools/reporting/. create_vulnerability_report with title,
severity, CVSS, endpoint, PoC code, remediation steps, code locations.
Uses the cvss dependency to compute scores. Routed through
llm/dedupe.py before being appended to the run's findings list.
2.9 finish — Scan completion (local)
strix/tools/finish/finish_actions.py. finish_scan(executive_summary, methodology, technical_analysis, recommendations):
- Only callable by the root agent.
- Validates all subagents are
completedbefore accepting — forces the root to clean up its tree. - Writes final report and flips the tracer to completed state.
Subagents use agent_finish (the agents_graph module) instead.
2.10 thinking — Chain-of-thought scratchpad (local)
strix/tools/thinking/. think(thought) — a no-op tool whose only
purpose is to record the agent's reasoning step without it counting as a
substantive action. Encourages explicit planning.
2.11 todo — Structured task list (sandbox)
strix/tools/todo/. Create/update/complete todo items. The system
prompt instructs the root agent to maintain a todo list as part of
orchestration.
2.12 load_skill — Dynamic skill loading (local)
strix/tools/load_skill/load_skill_actions.py. The agent can pull
additional markdown playbooks into its context mid-run:
- Validates the requested skills exist.
- Caps total loaded skills at 5
(
skills/__init__.py:63-78). - Rebuilds the system prompt with the new skill set
(
llm.add_skills→_load_system_prompt). - Updates
state.context["loaded_skills"]for observability.
2.13 web_search — Perplexity (local)
strix/tools/web_search/. web_search(query) hits Perplexity's
sonar-reasoning-pro model. Registration is gated on
PERPLEXITY_API_KEY being set (registry.py:175-187). Useful for
"what's the latest CVE for Foo 1.2?"-style runtime queries.
3. Routing Summary: Host vs. Sandbox
| Tool | Routing | Why |
|---|---|---|
agents_graph.* |
Host | Needs direct access to _agent_graph globals |
thinking.think |
Host | Introspective only |
finish.finish_scan |
Host | Synchronous subagent validation |
load_skill |
Host | Swaps in-process system prompt |
web_search |
Host | External HTTP call, no sandbox dep |
terminal.*, python.*, browser.*, proxy.*, file_edit.*, notes.*, reporting.*, todo.* |
Sandbox | Needs filesystem/process isolation and proxied network |
Note: the same Python implementation is reused on both sides — the
tool_server imports strix.tools and dispatches to the exact function
the executor would have called locally. The routing decision happens at
the caller, not the tool.
4. Transport Details
HTTP POST /execute request body:
{
"agent_id": "agent_abc123",
"tool_name": "terminal_execute",
"kwargs": {"command": "nmap -sV target.tld", "timeout": 60}
}Response:
{"result": { "stdout": "…", "exit_code": 0 }, "error": null}Per-agent cancellation (runtime/tool_server.py:94-97): if the same
agent_id submits a new call while a previous one is in-flight, the
server cancels the prior task. This stops long-running tools from
bleeding into the next iteration when the user interrupts.
5. Design Observations
Good ideas:
- Schemas as data, not docstrings. XML files are the single source of truth for what the LLM sees; the Python functions can be refactored without touching the LLM contract.
- Same code, two routes. Tools are routing-agnostic — they don't know if they're running locally or over HTTP, which keeps the implementation simple.
- Per-agent cancellation in the server. Solves the "kill in-flight when user hits Esc" problem without needing a side-channel.
- Screenshot-as-result. Cleanest multimodal integration — the agent asks for a click, gets back text + an image, and can reason over what appeared on screen without any extra tool.
- Tool result truncation + XML wrap. Prevents huge tool outputs (a wordlist fuzz, a semgrep run) from blowing the context window while keeping machine-parseable structure.
Potential pitfalls:
- Only the first tool call per message is honored. If the LLM reliably
emits two (happens with some providers under pressure), the second is
silently dropped. The parser at least stops early on
</function>which minimizes waste, but no warning is emitted to the LLM. - Argument type coercion is generous. Passing a string where an int is
expected sometimes succeeds, sometimes fails with a bare
ValueError. A stricter pre-check with a helpful error message would probably improve the LLM's self-correction. - No circuit breaker on repeated tool errors. If a tool keeps failing (wrong path, bad syntax), the agent can burn iterations without the framework intervening.
- Tool schemas live as XML alongside code but are not schema-checked
against the function signature at import time. Drift between XML and
Python is possible. A lightweight
schema == signaturetest would catch this.