Core Logic & Data Flow
Message Processing: The Heart of NanoClaw
The entire system exists to convert inbound chat messages into Claude Agent SDK invocations and route responses back. This section traces the complete data flow.
Inbound Message Path
Platform SDK event (e.g., WhatsApp message received)
│
▼
Channel.onMessage(chatJid, msg) # src/index.ts:617-643
│
├── Remote control intercept # /remote-control commands
│
├── Sender allowlist check # Drop if denied sender in drop mode
│ └── shouldDropMessage() + isSenderAllowed() # src/sender-allowlist.ts
│
└── storeMessage(msg) # src/db.ts — INSERT into messages table
Messages are stored immediately and unconditionally (unless dropped by allowlist). The message loop then discovers them on its next poll.
Message Loop: Polling vs Piping
The message loop (src/index.ts:419-520) has two code paths:
Path A — Pipe to active container (fast path):
// src/index.ts:495-508
if (queue.sendMessage(chatJid, formatted)) {
// Container already running for this group — pipe message via IPC file
lastAgentTimestamp[chatJid] = messagesToSend[messagesToSend.length - 1].timestamp;
saveState();
channel.setTyping?.(chatJid, true);
}This writes a JSON file to data/ipc/{groupFolder}/input/, which the running container picks up via drainIpcInput() and feeds into the active SDK query as a follow-up user message.
Path B — Enqueue for new container (cold start):
// src/index.ts:511
queue.enqueueMessageCheck(chatJid);This triggers the GroupQueue to spawn a new container when a slot is available.
Trigger Pattern Matching
Not every message triggers the agent. The trigger system (src/index.ts:240-250, src/config.ts:67-80):
| Group Type | Trigger Behavior |
|---|---|
Main group (isMain: true) |
No trigger needed — every message is processed |
1-on-1 chats (requiresTrigger: false) |
No trigger needed |
| Group chats (default) | Must start with @Andy (or configured trigger word) |
// src/config.ts:71-73
function buildTriggerPattern(trigger: string): RegExp {
return new RegExp(`^${escapeRegex(trigger)}\\b`, 'i');
}Non-trigger messages still accumulate in the database. When a trigger message eventually arrives, getMessagesSince() fetches all messages since the last agent response, giving the agent full conversational context.
Context Accumulation Pattern
This is a subtle but important design: messages that don't trigger the agent are not discarded — they become context for the next triggered invocation.
User A: "The build is broken" ← stored, no trigger, not processed
User B: "Yeah, the tests fail too" ← stored, no trigger, not processed
User A: "@Andy can you help debug?" ← trigger! Fetches ALL THREE messages
The agent sees all three messages formatted as XML, giving it conversational context without being activated for every message.
Container Lifecycle
Spawning (src/container-runner.ts:277-340)
const container = spawn(CONTAINER_RUNTIME_BIN, containerArgs, {
stdio: ['pipe', 'pipe', 'pipe'],
});
container.stdin.write(JSON.stringify(input));
container.stdin.end();The container receives its entire context through stdin as a single JSON payload, then the host streams results from stdout.
Volume Mount Strategy (src/container-runner.ts:61-224)
The mount configuration enforces strict isolation:
Main Group Mounts:
/workspace/project ← project root (READ-ONLY)
/workspace/project/.env ← shadowed with /dev/null (blocks secret access)
/workspace/group ← groups/main/ (read-write)
/home/node/.claude ← isolated sessions directory
/workspace/ipc ← per-group IPC namespace
/app/src ← per-group agent-runner source (customizable)
/workspace/extra/* ← validated additional mounts
Non-Main Group Mounts:
/workspace/group ← groups/{folder}/ (read-write)
/workspace/global ← groups/global/ (READ-ONLY)
/home/node/.claude ← isolated sessions directory
/workspace/ipc ← per-group IPC namespace
/app/src ← per-group agent-runner source
/workspace/extra/* ← validated additional mounts
Key security decisions:
.envis shadowed with/dev/nullso agents can't read secrets even from read-only project root (src/container-runner.ts:82-91)- Each group gets its own
.claude/directory preventing cross-group session access (src/container-runner.ts:118-166) - Each group gets its own IPC namespace preventing cross-group privilege escalation (
src/container-runner.ts:168-178) - Agent-runner source is copied per-group, allowing customization without affecting others (
src/container-runner.ts:182-211)
Streaming Output Protocol (src/container-runner.ts:342-397)
The agent produces output wrapped in sentinel markers:
---NANOCLAW_OUTPUT_START---
{"status":"success","result":"The weather is sunny!","newSessionId":"abc123"}
---NANOCLAW_OUTPUT_END---
The host parses these incrementally from the stdout stream. Multiple marker pairs can appear (one per agent teams result). The parsing is robust against partial reads:
// src/container-runner.ts:369-396
while ((startIdx = parseBuffer.indexOf(OUTPUT_START_MARKER)) !== -1) {
const endIdx = parseBuffer.indexOf(OUTPUT_END_MARKER, startIdx);
if (endIdx === -1) break; // Incomplete pair, wait for more data
// ... parse JSON, call onOutput
}Timeout Management (src/container-runner.ts:421-501)
Two timeout concepts:
- Hard timeout — kills container after
max(CONTAINER_TIMEOUT, IDLE_TIMEOUT + 30s)of no output markers - Idle timeout — host writes
_closesentinel when agent hasn't produced output for 30 minutes
Output received → resetTimeout() # Reset hard timeout
→ resetIdleTimer() # Reset idle timer (src/index.ts:269-278)
Idle timer fires → queue.closeStdin() # Write _close sentinel
→ Agent sees sentinel → exits query loop gracefully
Crucially, stderr does NOT reset the timeout (src/container-runner.ts:405-406). The SDK writes continuous debug logs to stderr, so only actual output markers indicate real progress.
GroupQueue: Concurrency Control
File: src/group-queue.ts
The GroupQueue enforces:
- Per-group serialization — only one container per group at a time
- Global concurrency limit —
MAX_CONCURRENT_CONTAINERSacross all groups - Task priority — tasks execute before queued messages (tasks can't be re-discovered from SQLite, messages can)
- Retry with exponential backoff — 5 retries, starting at 5s, doubling each time
State machine per group:
IDLE ──enqueueMessageCheck()──► ACTIVE (running container)
▲ │
│ ▼
│ Container finishes
│ │
│ ┌─────────────────┼─────────────────┐
│ ▼ ▼ ▼
│ pendingTasks? pendingMessages? Nothing?
│ │ │ │
│ ▼ ▼ ▼
│ runTask() runForGroup() drainWaiting()
│ │ │ (next group)
│ └────────┬────────┘
│ ▼
└──────────────── drainGroup()
The queue also supports message piping into active containers:
// src/group-queue.ts:160-178
sendMessage(groupJid: string, text: string): boolean {
const state = this.getGroup(groupJid);
if (!state.active || !state.groupFolder || state.isTaskContainer) return false;
// Write IPC file for the running container to pick up
fs.writeFileSync(tempPath, JSON.stringify({ type: 'message', text }));
fs.renameSync(tempPath, filepath); // Atomic write
return true;
}Agent Inside the Container
SDK Integration (container/agent-runner/src/index.ts:394-432)
The agent-runner uses the Claude Agent SDK's query() function with a rich configuration:
for await (const message of query({
prompt: stream, // MessageStream (async iterable)
options: {
cwd: '/workspace/group',
resume: sessionId, // Resume previous conversation
resumeSessionAt: resumeAt, // Resume at specific message UUID
systemPrompt: globalClaudeMd, // Append global CLAUDE.md
allowedTools: [
'Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep',
'WebSearch', 'WebFetch',
'Task', 'TaskOutput', 'TaskStop',
'TeamCreate', 'TeamDelete', 'SendMessage', // Agent swarms
'TodoWrite', 'ToolSearch', 'Skill',
'NotebookEdit',
'mcp__nanoclaw__*' // All NanoClaw MCP tools
],
permissionMode: 'bypassPermissions', // Safe because container is the sandbox
mcpServers: {
nanoclaw: { // MCP server for IPC tools
command: 'node',
args: [mcpServerPath],
env: { NANOCLAW_CHAT_JID, NANOCLAW_GROUP_FOLDER, NANOCLAW_IS_MAIN },
},
},
hooks: {
PreCompact: [{ hooks: [createPreCompactHook()] }], // Archive before compaction
},
}
}))Notable decisions:
permissionMode: 'bypassPermissions'— The container IS the sandbox, so no need for application-level permission promptsMessageStream(async iterable) — KeepsisSingleUserTurn=false, allowing agent teams subagents to run to completionPreCompacthook — Archives full conversation transcripts to/workspace/group/conversations/before the SDK compacts context
MCP Tools: Agent → Host Communication
The agent can't call host functions directly. Instead, it uses MCP tools that write JSON files:
Agent calls mcp__nanoclaw__send_message("Hello!")
│
▼
ipc-mcp-stdio.ts writes to /workspace/ipc/messages/{timestamp}.json
│
▼
Host's IPC watcher (src/ipc.ts) picks up file on next poll (1s)
│
▼
Host routes message to the correct channel.sendMessage()
This file-based IPC is the only communication channel from agent to host. It's simple, debuggable (you can inspect the files), and doesn't require network setup inside the container.
Task Scheduling
Schedule Types (src/task-scheduler.ts:31-63)
function computeNextRun(task): Date | null {
switch (task.schedule_type) {
case 'cron':
// Parse cron, get next occurrence in local timezone
return CronExpressionParser.parse(value, { tz: TIMEZONE }).next().toDate();
case 'interval':
// Anchor to scheduled time to prevent drift
const ms = parseInt(value, 10);
const anchor = new Date(task.next_run || task.created_at);
let next = new Date(anchor.getTime() + ms);
while (next <= now) next = new Date(next.getTime() + ms); // Skip missed intervals
return next;
case 'once':
return null; // No recurrence
}
}The interval drift prevention is notable: instead of computing now + interval, it anchors to the original scheduled time and adds multiples of the interval. This prevents gradual drift caused by execution time.
Task Context Modes
isolated— Fresh session each run, no conversation history (default, safe)group— Shares the group's session, preserving state across runs
Script Pre-check (container/agent-runner/src/index.ts:476-516)
Scheduled tasks can include a bash script that runs first:
#!/bin/bash
# Check if there's a new release on GitHub
latest=$(curl -s https://api.github.com/repos/owner/repo/releases/latest | jq -r .tag_name)
current="v1.0.0"
if [ "$latest" = "$current" ]; then
echo '{"wakeAgent": false}'
else
echo '{"wakeAgent": true, "data": {"newVersion": "'$latest'"}}'
fiIf wakeAgent is false, the Claude agent never starts — saving API costs for conditional tasks. The script output data is injected into the agent's prompt when it does wake.
Cursor Management and Recovery
Two-level cursor system:
lastTimestamp(global) — "I've seen all messages up to this point" — advances in the message looplastAgentTimestamp[chatJid](per-group) — "I've processed messages up to this point for this group" — advances when agent receives messages
Crash recovery (src/index.ts:121-136, 526-542):
function getOrRecoverCursor(chatJid: string): string {
const existing = lastAgentTimestamp[chatJid];
if (existing) return existing;
// Cursor missing — recover from last bot reply in DB
const botTs = getLastBotMessageTimestamp(chatJid, ASSISTANT_NAME);
if (botTs) {
lastAgentTimestamp[chatJid] = botTs;
return botTs;
}
return ''; // Process all messages from the beginning
}Cursor rollback on error (src/index.ts:314-331):
if (output === 'error' || hadError) {
if (outputSentToUser) {
// Already sent response — don't roll back (would cause duplicates)
return true;
}
// Roll back cursor so retries can re-process these messages
lastAgentTimestamp[chatJid] = previousCursor;
saveState();
}The rollback distinguishes between "error before any output" (safe to retry) and "error after output" (can't retry without duplicates). This is a pragmatic choice over at-most-once delivery.
Internal Tag Stripping
Agents can include <internal>...</internal> tags in their output for reasoning that shouldn't be sent to users:
// src/index.ts:292
const text = raw.replace(/<internal>[\s\S]*?<\/internal>/g, '').trim();This lets agents "think out loud" in their output while keeping the user-facing response clean. The internal content is still logged for debugging.