Strix runs many sub-agents that each test for vulnerabilities. They report findings independently. Two findings might describe the same underlying bug with different payloads, line numbers, and prose — a textual hash would never match them.
Instead, Strix asks an LLM: “Are these two findings the same root cause? Here’s both. Reason about it.” Same-root-cause findings collapse into one report.
def is_same_root_cause(a: Finding, b: Finding) -> bool:
response = llm.complete(
DEDUPE_PROMPT.format(a=a.full_text(), b=b.full_text())
)
return response.startswith("YES")
The prompt is essentially: “two findings, both might describe the same vulnerability, decide if the root cause is identical.”
Why this is non-obvious
Most deduplication is fast and cheap (hash, embedding cosine). LLM-based dedupe is slow and expensive — orders of magnitude more cost per pair. You’d never use this for tweets or log lines.
But for high-stakes, low-volume domains (security findings, customer support tickets, legal contracts), the false-merge cost dwarfs the LLM-call cost. Spending tokens on dedupe is correct.
Pattern beyond Strix
This generalizes anywhere two reports might be the same despite surface differences:
- Customer complaints (different language, same root cause).
- Bug reports (different stack traces, same broken function).
- Search results in citation-heavy domains.
When NOT to use it
- Volume is high (millions of items): too expensive.
- Surface similarity is a strong signal (textual matches): hash first, LLM only on near-misses.
- Latency-sensitive flow: do it offline as a batch job.
Hybrid pattern
A practical optimization: cheap-first. Embedding cosine to find candidates above some threshold; LLM only for the candidate pairs. Most pairs prune cheaply; expensive reasoning only where it matters.
Sources
-
strix/05_skills_and_prompts.md:20? unverified