TL;DR 8 min read

Three formats compete in the wild: Anthropic’s native tool_use blocks (cleanest, Anthropic-only), OpenAI-style tool_calls[] (the de facto standard, supported by every provider), and inline XML <function> blocks (model-agnostic, streams cleanly, one team uses this brilliantly). Each format makes different things easy and different things annoying. The choice cascades into how you stream, how you parse, how you abstract providers, and what happens when args are malformed.

Tool calling formats

The agent has decided to run a function. It needs to tell you, over the wire, the function’s name and arguments. There are three live formats for that conversation, and one rising standard (MCP) that’s orthogonal.

The three formats, side by side

The model emits a structured content block of type tool_use alongside its text:

{
  "role": "assistant",
  "content": [
    { "type": "text", "text": "I'll read the file." },
    { "type": "tool_use", "id": "toolu_01", "name": "Read", "input": { "path": "src/agent.ts" } }
  ]
}

You reply with a matching tool_result content block:

{ "role": "user", "content": [{ "type": "tool_result", "tool_use_id": "toolu_01", "content": "..." }] }

Why it’s nice. Structured at the wire level. Thinking blocks can sit alongside. Parallel tool calls are first-class. Server-side schema validation runs before the response even reaches you.

Why it bites. Anthropic-only. To use it anywhere else you need a translation layer that converts tool_use ↔ whatever the other provider wants.

The model emits a sibling tool_calls array:

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    { "id": "call_01", "type": "function",
      "function": { "name": "read", "arguments": "{\"path\":\"src/agent.ts\"}" } }
  ]
}

The reply is a follow-up message with role: "tool" and a matching tool_call_id. Note that arguments is a JSON string, not a JSON object. Streaming arguments arrive as deltas, often as partial JSON — you need a forgiving parser to dispatch before the call has fully streamed.

Why it’s nice. Every major provider speaks this dialect — Mistral, Together, Groq, vLLM, even local-LLM servers. The lowest-friction default for cross-provider work.

Why it bites. The “JSON string of JSON” design is awkward to stream-parse. Parallel tool calls are inconsistently supported across providers. You parse twice (string → JSON → schema-validate), which means more places for things to go wrong.

The model emits plain text containing XML-shaped blocks; the agent regex-parses them:

I'll inspect the directory first.

<function=list_dir>
<parameter=path>/workspaces/strix</parameter>
<parameter=hidden>false</parameter>
</function>

The system prompt forbids more than one <function> block per turn. That single constraint unlocks the cleverest cost optimization in the corpus: as soon as the streaming parser sees </function>, you can abort the connection — anything that comes after is hallucinated rambling that you’d otherwise pay for.

Why it’s nice. Model-agnostic — works on any LLM that follows instructions. Trivially streams. Easy to debug; you can paste it into a chat. The format is the protocol, so adding a new model is just verifying it’s instruction-following enough.

Why it bites. No schema enforcement at the protocol level — the agent must defensively parse, and the model can produce invalid args. The mitigation is strict per-tool examples in the system prompt and a robust XML parser. With current high-quality models, the failure rate is low; with weaker models, this format is risky.

A quick decision flow

flowchart TD
A[Need to support N providers?] -- yes --> B{Want first-class semantics?}
A -- no, single provider --> C[Use the native format]
B -- yes --> D[OpenAI fn-calling + adapter for Anthropic]
B -- no, want streaming purity --> E[XML function blocks]
D --> F[Pattern: native + mock-fallback fn-calling]
E --> G[Pattern: prompt is the protocol]
C --> H[Pattern: zero abstraction]

One decision tree from format choice to abstraction pattern.

Three provider-abstraction patterns

The format choice cascades into how you abstract providers.

OpenHands uses LiteLLM + a mock fn-call converter that injects examples into the prompt for non-native models and post-processes their output. Every model gets the same structured tool API, even if it doesn’t natively support function calling.
Hermes ships separate adapters per provider plus an auxiliary client for side-channel tasks (summarization, dedupe). More code, but explicit per-provider feature gating — if Anthropic ships a new feature on day one, you opt in on day one.
Strix sidesteps the question by making the prompt the tool protocol. Any model that follows instructions works. No adapters needed.

See provider-abstraction for when each pattern earns its keep.

Common pitfalls

Pitfall	Symptom	Fix
Trusting the LLM’s JSON args without re-validation	runtime crash on bad types	Zod / Pydantic at the dispatch boundary
Not handling partial-JSON streams (OpenAI fn)	tool calls fire late	partial JSON parser; dispatch on first complete arg
Parallel tool calls that depend on each other	second call uses first’s stale result	classify tool side-effects; serialize non-pure tools
XML format on a weak model	invalid XML mid-stream	use stronger models, or fall back to a repair LLM call
Mixing formats per turn	doubled parser surface; double failure	pick one per agent, even if your framework supports many

MCP — orthogonal, not alternative

Model Context Protocol is a way to expose tools and resources to any model over a uniform interface. It’s compatible with all three formats above — MCP gives you a registry of tools, and the format is whatever the model provider speaks.

When MCP earns its keep: you’re shipping tools that agents from many vendors will want to use, or your tools live in a different process / language than the agent. Otherwise it adds a hop you don’t need.

For Swisscheese multi-agent code review

Format choice tends to follow the underlying coding agent. If Swisscheese sits on top of Claude Code, you inherit tool_use for free; if Codex, you inherit OpenAI fn-calling. The interesting decision is at the meta-harness layer: when reviewers report verdicts, do they speak the same format? Two clean answers:

Reviewers as MCP tools. Each reviewer is an MCP server returning structured verdicts. Underlying agent calls them via its native format. Maximum compatibility.
Reviewers as nested agents. Each reviewer is itself an agent loop with its own tool format. The harness collects verdicts from a structured return value, not a wire format.

The first is more decoupled. The second is simpler. Pick based on whether you’ll ever want third-party reviewers.

Projects that implement this

Claude Code — Anthropic's official agentic CLI. Streaming tool calls, prompt caching, thinking signatures, multi-agent subagents, slash commands.
OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
NanoClaw — Tiny Claude-Code-shaped clone. Excellent for studying the irreducible structure of an agent loop without production overhead.
OpenClaw — Open-source Claude-Code-style agent reproduction. Bigger than NanoClaw, reveals which patterns scale and which stay minimal.
Kimi Code — Moonshot's Kimi-flavored coding agent. Compact reference for an agent loop with OpenAI-compatible tool calling.
ML Intern — ML-engineering-flavored agent. Tooling for data exploration, model training, and notebook-style work.
Open Design — Open-source design / UI-generation agent. LLM-driven design intent → code, with a design-system-aware tool surface.

Strix ●●●

Streaming early stop on </function>

A free 10-20% cost reduction per agent step. Compounds across hundreds of steps in a session.

streaming-tools tool-calling-formats agent-loop

Tool calling formats

Tool calling formats

The three formats, side by side

A quick decision flow

Three provider-abstraction patterns

Common pitfalls

MCP — orthogonal, not alternative

Projects that implement this

Related insights