← All concepts

Mailbox & message-passing

How agents talk to each other when they share a process — and why most teams reach for Redis when a Python dict would do.

0 projects 1 insights 3 variants
TL;DR 5 min read

When multiple agents are alive at once and need to coordinate, how they communicate is more constrained than it looks. Three patterns exist: tool-result return (children just return to parents — no mailbox at all), module-level dicts (Strix’s elegant in-process solution), and external queues (Redis/SQS/Kafka). Most teams reach for the third when the first or second is correct. Build the simplest queue that works.

Mailbox & message-passing

Multi-agent systems make people reach for distributed-systems infrastructure on day one. They are usually wrong.

The default: tool-result is the only message

Most multi-agent systems in the corpus don’t have peer-to-peer messaging. Children don’t message siblings. They return to the parent. The parent decides whether to invoke another child with the result.

This works because the parent’s loop is already doing dispatch — a child is just an unusually long-running tool. Nothing new to wire up. No queues, no race conditions, no order-of-delivery questions.

Most teams should use this pattern. Until they’re forced into something fancier.

When you actually need a mailbox

If two agents run concurrently and need to share state mid-flight:

  • A long-running monitor agent reporting to a coordinator.
  • Worker agents claiming work from a shared queue.
  • Producer/consumer between specialists.

Strix uses this for inter-agent reporting during long pentests: a sub-agent that finds a credential while testing one endpoint posts it to other agents that might benefit.

Strix’s module-level approach

class BaseAgent:
    _agent_messages: dict[str, list[Msg]] = {}

    def send_to(self, target_id: str, msg: Msg):
        BaseAgent._agent_messages.setdefault(target_id, []).append(msg)

    def receive(self) -> list[Msg]:
        return BaseAgent._agent_messages.pop(self.id, [])

Plain dict. No locks needed because Python’s GIL serializes individual dict operations — setdefault().append() is atomic, pop(key, []) is atomic. Single-process only. Cleaned up when the process exits.

Don’t reach for Redis early

External queues have real costs:

  • Latency (every message is a round-trip to a separate service).
  • Operational dependency (now you have a Redis to keep running).
  • Debug complexity (state is in two places).

Reach for them when:

  • Agents run on different machines.
  • You need durable messages that survive a process crash.
  • You want pub/sub across many subscribers.

Pick a mailbox

? What does coordination actually need to do?
  • Children just return to parents No mailbox. Use tool-result return. default
  • Concurrent agents share state in one process Module-level dicts à la Strix.
  • Agents on different machines External queue (Redis / SQS).
  • Need durable messages across crashes External queue with persistence.

Recommended default: Default to tool-result. Move up the ladder only when forced.

Pitfalls

  • Mailbox + retry + dedupe is a hard distributed-systems problem. Skip it if you can.
  • Cycles in the agent graph that pass messages → infinite chatter → bankruptcy. Cap message count or hops.
  • No back-pressure. A slow consumer falls behind, queue grows unbounded. Cap queue size; drop or block.