Core Logic & Data Flow

Message Processing: The Heart of NanoClaw

The entire system exists to convert inbound chat messages into Claude Agent SDK invocations and route responses back. This section traces the complete data flow.

Inbound Message Path

Platform SDK event (e.g., WhatsApp message received)
    │
    ▼
Channel.onMessage(chatJid, msg)                     # src/index.ts:617-643
    │
    ├── Remote control intercept                     # /remote-control commands
    │
    ├── Sender allowlist check                       # Drop if denied sender in drop mode
    │   └── shouldDropMessage() + isSenderAllowed()  # src/sender-allowlist.ts
    │
    └── storeMessage(msg)                            # src/db.ts — INSERT into messages table

Messages are stored immediately and unconditionally (unless dropped by allowlist). The message loop then discovers them on its next poll.

Message Loop: Polling vs Piping

The message loop (src/index.ts:419-520) has two code paths:

Path A — Pipe to active container (fast path):

// src/index.ts:495-508
if (queue.sendMessage(chatJid, formatted)) {
  // Container already running for this group — pipe message via IPC file
  lastAgentTimestamp[chatJid] = messagesToSend[messagesToSend.length - 1].timestamp;
  saveState();
  channel.setTyping?.(chatJid, true);
}

This writes a JSON file to data/ipc/{groupFolder}/input/, which the running container picks up via drainIpcInput() and feeds into the active SDK query as a follow-up user message.

Path B — Enqueue for new container (cold start):

// src/index.ts:511
queue.enqueueMessageCheck(chatJid);

This triggers the GroupQueue to spawn a new container when a slot is available.

Trigger Pattern Matching

Not every message triggers the agent. The trigger system (src/index.ts:240-250, src/config.ts:67-80):

Group Type	Trigger Behavior
Main group (`isMain: true`)	No trigger needed — every message is processed
1-on-1 chats (`requiresTrigger: false`)	No trigger needed
Group chats (default)	Must start with `@Andy` (or configured trigger word)

// src/config.ts:71-73
function buildTriggerPattern(trigger: string): RegExp {
  return new RegExp(`^${escapeRegex(trigger)}\\b`, 'i');
}

Non-trigger messages still accumulate in the database. When a trigger message eventually arrives, getMessagesSince() fetches all messages since the last agent response, giving the agent full conversational context.

Context Accumulation Pattern

This is a subtle but important design: messages that don't trigger the agent are not discarded — they become context for the next triggered invocation.

User A: "The build is broken"           ← stored, no trigger, not processed
User B: "Yeah, the tests fail too"      ← stored, no trigger, not processed
User A: "@Andy can you help debug?"     ← trigger! Fetches ALL THREE messages

The agent sees all three messages formatted as XML, giving it conversational context without being activated for every message.

Container Lifecycle

Spawning (src/container-runner.ts:277-340)

const container = spawn(CONTAINER_RUNTIME_BIN, containerArgs, {
  stdio: ['pipe', 'pipe', 'pipe'],
});
container.stdin.write(JSON.stringify(input));
container.stdin.end();

The container receives its entire context through stdin as a single JSON payload, then the host streams results from stdout.

Volume Mount Strategy (src/container-runner.ts:61-224)

The mount configuration enforces strict isolation:

Main Group Mounts:
  /workspace/project   ← project root (READ-ONLY)
  /workspace/project/.env ← shadowed with /dev/null (blocks secret access)
  /workspace/group     ← groups/main/ (read-write)
  /home/node/.claude   ← isolated sessions directory
  /workspace/ipc       ← per-group IPC namespace
  /app/src             ← per-group agent-runner source (customizable)
  /workspace/extra/*   ← validated additional mounts

Non-Main Group Mounts:
  /workspace/group     ← groups/{folder}/ (read-write)
  /workspace/global    ← groups/global/ (READ-ONLY)
  /home/node/.claude   ← isolated sessions directory
  /workspace/ipc       ← per-group IPC namespace
  /app/src             ← per-group agent-runner source
  /workspace/extra/*   ← validated additional mounts

Key security decisions:

.env is shadowed with /dev/null so agents can't read secrets even from read-only project root (src/container-runner.ts:82-91)
Each group gets its own .claude/ directory preventing cross-group session access (src/container-runner.ts:118-166)
Each group gets its own IPC namespace preventing cross-group privilege escalation (src/container-runner.ts:168-178)
Agent-runner source is copied per-group, allowing customization without affecting others (src/container-runner.ts:182-211)

Streaming Output Protocol (src/container-runner.ts:342-397)

The agent produces output wrapped in sentinel markers:

---NANOCLAW_OUTPUT_START---
{"status":"success","result":"The weather is sunny!","newSessionId":"abc123"}
---NANOCLAW_OUTPUT_END---

The host parses these incrementally from the stdout stream. Multiple marker pairs can appear (one per agent teams result). The parsing is robust against partial reads:

// src/container-runner.ts:369-396
while ((startIdx = parseBuffer.indexOf(OUTPUT_START_MARKER)) !== -1) {
  const endIdx = parseBuffer.indexOf(OUTPUT_END_MARKER, startIdx);
  if (endIdx === -1) break; // Incomplete pair, wait for more data
  // ... parse JSON, call onOutput
}

Timeout Management (src/container-runner.ts:421-501)

Two timeout concepts:

Hard timeout — kills container after max(CONTAINER_TIMEOUT, IDLE_TIMEOUT + 30s) of no output markers
Idle timeout — host writes _close sentinel when agent hasn't produced output for 30 minutes

Output received → resetTimeout()     # Reset hard timeout
                → resetIdleTimer()   # Reset idle timer (src/index.ts:269-278)

Idle timer fires → queue.closeStdin()  # Write _close sentinel
                 → Agent sees sentinel → exits query loop gracefully

Crucially, stderr does NOT reset the timeout (src/container-runner.ts:405-406). The SDK writes continuous debug logs to stderr, so only actual output markers indicate real progress.

GroupQueue: Concurrency Control

File: src/group-queue.ts

The GroupQueue enforces:

Per-group serialization — only one container per group at a time
Global concurrency limit — MAX_CONCURRENT_CONTAINERS across all groups
Task priority — tasks execute before queued messages (tasks can't be re-discovered from SQLite, messages can)
Retry with exponential backoff — 5 retries, starting at 5s, doubling each time

State machine per group:

  IDLE ──enqueueMessageCheck()──► ACTIVE (running container)
    ▲                                │
    │                                ▼
    │                          Container finishes
    │                                │
    │              ┌─────────────────┼─────────────────┐
    │              ▼                 ▼                  ▼
    │        pendingTasks?     pendingMessages?    Nothing?
    │              │                 │                  │
    │              ▼                 ▼                  ▼
    │         runTask()        runForGroup()      drainWaiting()
    │              │                 │            (next group)
    │              └────────┬────────┘
    │                       ▼
    └──────────────── drainGroup()

The queue also supports message piping into active containers:

// src/group-queue.ts:160-178
sendMessage(groupJid: string, text: string): boolean {
  const state = this.getGroup(groupJid);
  if (!state.active || !state.groupFolder || state.isTaskContainer) return false;
  // Write IPC file for the running container to pick up
  fs.writeFileSync(tempPath, JSON.stringify({ type: 'message', text }));
  fs.renameSync(tempPath, filepath);  // Atomic write
  return true;
}

Agent Inside the Container

SDK Integration (container/agent-runner/src/index.ts:394-432)

The agent-runner uses the Claude Agent SDK's query() function with a rich configuration:

for await (const message of query({
  prompt: stream,                    // MessageStream (async iterable)
  options: {
    cwd: '/workspace/group',
    resume: sessionId,               // Resume previous conversation
    resumeSessionAt: resumeAt,       // Resume at specific message UUID
    systemPrompt: globalClaudeMd,    // Append global CLAUDE.md
    allowedTools: [
      'Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep',
      'WebSearch', 'WebFetch',
      'Task', 'TaskOutput', 'TaskStop',
      'TeamCreate', 'TeamDelete', 'SendMessage',  // Agent swarms
      'TodoWrite', 'ToolSearch', 'Skill',
      'NotebookEdit',
      'mcp__nanoclaw__*'             // All NanoClaw MCP tools
    ],
    permissionMode: 'bypassPermissions',  // Safe because container is the sandbox
    mcpServers: {
      nanoclaw: {                    // MCP server for IPC tools
        command: 'node',
        args: [mcpServerPath],
        env: { NANOCLAW_CHAT_JID, NANOCLAW_GROUP_FOLDER, NANOCLAW_IS_MAIN },
      },
    },
    hooks: {
      PreCompact: [{ hooks: [createPreCompactHook()] }],  // Archive before compaction
    },
  }
}))

Notable decisions:

permissionMode: 'bypassPermissions' — The container IS the sandbox, so no need for application-level permission prompts
MessageStream (async iterable) — Keeps isSingleUserTurn=false, allowing agent teams subagents to run to completion
PreCompact hook — Archives full conversation transcripts to /workspace/group/conversations/ before the SDK compacts context

MCP Tools: Agent → Host Communication

The agent can't call host functions directly. Instead, it uses MCP tools that write JSON files:

Agent calls mcp__nanoclaw__send_message("Hello!")
    │
    ▼
ipc-mcp-stdio.ts writes to /workspace/ipc/messages/{timestamp}.json
    │
    ▼
Host's IPC watcher (src/ipc.ts) picks up file on next poll (1s)
    │
    ▼
Host routes message to the correct channel.sendMessage()

This file-based IPC is the only communication channel from agent to host. It's simple, debuggable (you can inspect the files), and doesn't require network setup inside the container.

Task Scheduling

Schedule Types (src/task-scheduler.ts:31-63)

function computeNextRun(task): Date | null {
  switch (task.schedule_type) {
    case 'cron':
      // Parse cron, get next occurrence in local timezone
      return CronExpressionParser.parse(value, { tz: TIMEZONE }).next().toDate();
 
    case 'interval':
      // Anchor to scheduled time to prevent drift
      const ms = parseInt(value, 10);
      const anchor = new Date(task.next_run || task.created_at);
      let next = new Date(anchor.getTime() + ms);
      while (next <= now) next = new Date(next.getTime() + ms);  // Skip missed intervals
      return next;
 
    case 'once':
      return null;  // No recurrence
  }
}

The interval drift prevention is notable: instead of computing now + interval, it anchors to the original scheduled time and adds multiples of the interval. This prevents gradual drift caused by execution time.

Task Context Modes

isolated — Fresh session each run, no conversation history (default, safe)
group — Shares the group's session, preserving state across runs

Script Pre-check (container/agent-runner/src/index.ts:476-516)

Scheduled tasks can include a bash script that runs first:

#!/bin/bash
# Check if there's a new release on GitHub
latest=$(curl -s https://api.github.com/repos/owner/repo/releases/latest | jq -r .tag_name)
current="v1.0.0"
if [ "$latest" = "$current" ]; then
  echo '{"wakeAgent": false}'
else
  echo '{"wakeAgent": true, "data": {"newVersion": "'$latest'"}}'
fi

If wakeAgent is false, the Claude agent never starts — saving API costs for conditional tasks. The script output data is injected into the agent's prompt when it does wake.

Cursor Management and Recovery

Two-level cursor system:

lastTimestamp (global) — "I've seen all messages up to this point" — advances in the message loop
lastAgentTimestamp[chatJid] (per-group) — "I've processed messages up to this point for this group" — advances when agent receives messages

Crash recovery (src/index.ts:121-136, 526-542):

function getOrRecoverCursor(chatJid: string): string {
  const existing = lastAgentTimestamp[chatJid];
  if (existing) return existing;
 
  // Cursor missing — recover from last bot reply in DB
  const botTs = getLastBotMessageTimestamp(chatJid, ASSISTANT_NAME);
  if (botTs) {
    lastAgentTimestamp[chatJid] = botTs;
    return botTs;
  }
  return '';  // Process all messages from the beginning
}

Cursor rollback on error (src/index.ts:314-331):

if (output === 'error' || hadError) {
  if (outputSentToUser) {
    // Already sent response — don't roll back (would cause duplicates)
    return true;
  }
  // Roll back cursor so retries can re-process these messages
  lastAgentTimestamp[chatJid] = previousCursor;
  saveState();
}

The rollback distinguishes between "error before any output" (safe to retry) and "error after output" (can't retry without duplicates). This is a pragmatic choice over at-most-once delivery.

Internal Tag Stripping

Agents can include <internal>...</internal> tags in their output for reasoning that shouldn't be sent to users:

// src/index.ts:292
const text = raw.replace(/<internal>[\s\S]*?<\/internal>/g, '').trim();

This lets agents "think out loud" in their output while keeping the user-facing response clean. The internal content is still logged for debugging.