CodeDocs Vault

Tools Subsystem

Tools are the LLM's effector system. Strix ships ~15 tool modules covering shells, browsers, HTTP proxy, Python runtime, file editing, notes, reporting, coordination, thinking, finish, todo, skill loading, and web search. This doc covers the common plumbing and then each module in turn.


1. Core Plumbing

1.1 Registry (strix/tools/registry.py)

Tools register at import time via a decorator:

@register_tool(sandbox_execution=True)
async def terminal_execute(agent_state, command, timeout=60, ...):
    ...

The decorator (registry.py:190-251) does four things:

  1. Capability gating (:175-187). Skips registration if:
    • tool requires browser and STRIX_DISABLE_BROWSER is set,
    • tool is web_search and PERPLEXITY_API_KEY is missing,
    • running in "sandbox mode" (the FastAPI side) vs. host mode changes what's available.
  2. Schema loading — for foo_actions.py it reads foo_actions_schema.xml from the same directory (_load_xml_schema, :47-88).
  3. Parameter parsing — extracts <parameter name="..." required="..."> entries for runtime validation (_parse_param_schema, :90-115).
  4. Indexing — appends to the global tools list and two lookup dicts: _tools_by_name, _tool_param_schemas (:239-240).

A ContextVar-based current-agent tracker lives at strix/tools/context.py:1-13 for tools that need to find "which agent called me" across asyncio.to_thread boundaries. Most tools don't use it — they receive agent_state as a parameter when the executor detects the parameter in their signature (registry.py:265-270).

1.2 Executor (strix/tools/executor.py)

The entry point for every tool call. execute_tool() (:29-115):

  1. Route decisionshould_execute_in_sandbox(tool_name) (:273-277). If the tool's sandbox_execution=True and we're not already inside the sandbox (STRIX_SANDBOX_MODE env flag), go remote.
  2. Local path (:101-115)
    • Look up function from registry.
    • convert_arguments(fn, kwargs) — type-coerce strings from the LLM into the declared Python types (handles Union, Optional, JSON for lists/dicts, literal fallbacks). argument_parser.py:15-47.
    • Inject agent_state if the function signature asks for it (needs_agent_state, registry.py:265-270).
    • Await if coroutine, call otherwise.
  3. Remote path (:39-98)
    • httpx.AsyncClient POST to {sandbox_url}/execute with JSON {"agent_id", "tool_name", "kwargs"}.
    • Authorization: Bearer {agent_state.sandbox_token}.
    • 150s total timeout (120s server timeout + 30s buffer), 10s connect timeout.
    • On {"error": ...} raise RuntimeError.
  4. Result formatting_format_tool_result (:227-256)
    • If result is {"screenshot": "<base64>", …}, extract the image into a vision content block and strip from the text result.
    • Truncate results >10KB to first 4KB + ellipsis + last 4KB.
    • Wrap in <tool_result><tool_name>X</tool_name><result>Y</result></tool_result>.
  5. History update — append to conversation_history as a user-role message; if images were extracted, the message is multi-part (:313-342).

1.3 Argument Parser (strix/tools/argument_parser.py:15-47)

Strings from the LLM coerced to declared types:

Not a full schema validator — it relies on the LLM following the XML schema + its own system-prompt instructions. The validation that does exist is at executor.py:130-162: checks required params are present and no unknown ones, returning human-readable error messages with schema hints back to the LLM.

1.4 The Schema Contract

Every tool module foo_actions.py has a sibling foo_actions_schema.xml. Shape:

<tools>
  <tool name="terminal_execute">
    <description>Execute a shell command in a persistent tmux session.</description>
    <parameters>
      <parameter name="command" type="string" required="true">
        <description>The shell command to run.</description>
      </parameter>
      <parameter name="timeout" type="integer" required="false">
        <description>Max seconds to wait before returning.</description>
      </parameter>
    </parameters>
    <returns type="Dict[str, Any]">…</returns>
    <examples>…</examples>
  </tool>
</tools>

The registry assembles these into the tools prompt via get_tools_prompt() (registry.py:280-300):

This means the LLM sees the full XML spec of every available tool on every turn (modulo prompt caching — Anthropic ephemeral blocks let providers reuse the cached system prompt).


2. Tool Modules

Each has *_actions.py (implementation) and *_actions_schema.xml (LLM-facing spec). sandbox_execution flag in the decorator determines routing.

2.1 agents_graph — Multi-agent coordination (local)

strix/tools/agents_graph/agents_graph_actions.py. Local-only — needs direct access to the module-level agent graph.

Action Purpose
create_agent(task, name, inherit_context, skills) Spawn a subagent in a background thread with a focused task and up to 5 skills. Inherits parent's sandbox handle. (:384-492)
agent_finish(result_summary, findings, …) Subagent signals completion; result propagates to parent via inter-agent message. (:567-685)
send_message_to_agent(to, content, priority) Push a message into another agent's mailbox.
wait_for_message(timeout) Idle until the mailbox has something (interactive-mode idle).
view_agent_graph() Dump the full tree with statuses — used by root agents to decide when to close out.

2.2 browser — Playwright automation (sandbox)

strix/tools/browser/. Single tool browser_action(action, url, ...) with ~22 sub-actions: launch, goto, click, type, fill, scroll, execute_js, view_source, screenshot, save_pdf, wait_for, new_tab, switch_tab, close_tab, evaluate, intercept_requests, …

2.3 terminal — tmux sessions (sandbox)

strix/tools/terminal/. Tool: terminal_execute(command, is_input, timeout, terminal_id).

2.4 proxy — Caido HTTP proxy (sandbox)

strix/tools/proxy/. Interacts with Caido's GraphQL API (port 48080).

Tool Purpose
list_requests HttpQL-filtered request log with pagination
`view_request(id, part="request" "response")`
send_request / repeat_request Craft or replay requests
scope_rules Manage Caido scope for noise reduction

All system traffic (curl, httpx, browser) flows through Caido because the entrypoint sets http_proxy/https_proxy system-wide (docker-entrypoint.sh:115-144).

2.5 python — persistent IPython REPLs (sandbox)

strix/tools/python/. python_action(action, code, session_id):

Pre-imports proxy helpers so agents can analyze/replay captured traffic from inside Python.

2.6 file_edit — OpenHands ACI (sandbox)

strix/tools/file_edit/. Three tools:

Tool Purpose
str_replace_editor(command, path, ...) view, create, str_replace, insert, undo_edit
list_files(path, recursive) Directory listing
search_files(path, regex, file_pattern) ripgrep-backed search

Reuses the editor primitives from the OpenHands project (openhands-aci sandbox-only dep).

2.7 notes — agent scratchpad (sandbox)

strix/tools/notes/. CRUD on categorized notes, persisted to a JSONL in the run directory. Categories: general, findings, methodology, questions, plan, wiki.

The wiki category is the shared repo memory that the source_aware_whitebox skill mandates — a single note per repository that every subagent reads-then-updates to share architecture/routing/sink maps.

2.8 reporting — Vulnerability reports (sandbox)

strix/tools/reporting/. create_vulnerability_report with title, severity, CVSS, endpoint, PoC code, remediation steps, code locations. Uses the cvss dependency to compute scores. Routed through llm/dedupe.py before being appended to the run's findings list.

2.9 finish — Scan completion (local)

strix/tools/finish/finish_actions.py. finish_scan(executive_summary, methodology, technical_analysis, recommendations):

Subagents use agent_finish (the agents_graph module) instead.

2.10 thinking — Chain-of-thought scratchpad (local)

strix/tools/thinking/. think(thought) — a no-op tool whose only purpose is to record the agent's reasoning step without it counting as a substantive action. Encourages explicit planning.

2.11 todo — Structured task list (sandbox)

strix/tools/todo/. Create/update/complete todo items. The system prompt instructs the root agent to maintain a todo list as part of orchestration.

2.12 load_skill — Dynamic skill loading (local)

strix/tools/load_skill/load_skill_actions.py. The agent can pull additional markdown playbooks into its context mid-run:

2.13 web_search — Perplexity (local)

strix/tools/web_search/. web_search(query) hits Perplexity's sonar-reasoning-pro model. Registration is gated on PERPLEXITY_API_KEY being set (registry.py:175-187). Useful for "what's the latest CVE for Foo 1.2?"-style runtime queries.


3. Routing Summary: Host vs. Sandbox

Tool Routing Why
agents_graph.* Host Needs direct access to _agent_graph globals
thinking.think Host Introspective only
finish.finish_scan Host Synchronous subagent validation
load_skill Host Swaps in-process system prompt
web_search Host External HTTP call, no sandbox dep
terminal.*, python.*, browser.*, proxy.*, file_edit.*, notes.*, reporting.*, todo.* Sandbox Needs filesystem/process isolation and proxied network

Note: the same Python implementation is reused on both sides — the tool_server imports strix.tools and dispatches to the exact function the executor would have called locally. The routing decision happens at the caller, not the tool.


4. Transport Details

HTTP POST /execute request body:

{
  "agent_id": "agent_abc123",
  "tool_name": "terminal_execute",
  "kwargs": {"command": "nmap -sV target.tld", "timeout": 60}
}

Response:

{"result": { "stdout": "…", "exit_code": 0 }, "error": null}

Per-agent cancellation (runtime/tool_server.py:94-97): if the same agent_id submits a new call while a previous one is in-flight, the server cancels the prior task. This stops long-running tools from bleeding into the next iteration when the user interrupts.


5. Design Observations

Good ideas:

Potential pitfalls: