LLM Interaction Patterns, Guardrails & Techniques
This document describes how Mistral Vibe shapes LLM behavior: how prompts are assembled, how tools are presented, how the conversation loop operates, and what guardrails constrain the LLM at every level. It provides the cross-cutting "LLM interaction design" perspective that complements the per-component docs (02-agent-loop, 04-middleware, 05-llm-backends).
1. System Prompt Assembly
Entry point: get_universal_system_prompt() at vibe/core/system_prompt.py:415-466
The system prompt is built by concatenating independent sections, joined by "\n\n". Each section is gated by a config flag, allowing agent profiles to strip the prompt down.
Assembly Flow
get_universal_system_prompt()
│
├── config.system_prompt ← base prompt (cli.md, 47 lines)
│
├── [if include_commit_signature]
│ └── _add_commit_signature() ← git commit heredoc template (line 361)
│
├── [if include_model_info]
│ └── f"Your model name is: `{config.active_model}`"
│
├── [if include_prompt_detail]
│ ├── _get_os_system_prompt() ← platform + shell detection (line 339)
│ ├── tool prompts ← per-tool .md files, joined by "\n---\n"
│ ├── skills catalog ← XML <available_skills> block (line 374)
│ └── subagents catalog ← markdown list (line 402)
│
├── [if include_project_context]
│ ├── ProjectContextProvider ← directory tree + git status (line 36)
│ └── _load_project_doc() ← VIBE.md / AGENTS.md content (line 24)
│
└── "\n\n".join(sections)
Section Details
| # | Section | Source | Config Gate |
|---|---|---|---|
| 1 | Base prompt | cli.md — behavioral rules, tool usage, code modification style, tone |
config.system_prompt (always present) |
| 2 | Commit signature | _add_commit_signature() (line 361) — heredoc template for git commits |
include_commit_signature |
| 3 | Model identity | Active model name string injection (line 427) | include_model_info |
| 4 | OS/shell info | _get_os_system_prompt() (line 339) — platform name + shell path, Windows-specific command rules |
include_prompt_detail |
| 5 | Tool prompts | Per-tool .md files loaded via BaseTool.get_tool_prompt() (tools/base.py:130-149), joined by \n---\n |
include_prompt_detail |
| 6 | Skills catalog | XML <available_skills> with HTML-escaped name, description, path per skill (line 374) |
include_prompt_detail |
| 7 | Subagents catalog | Markdown bullet list from AgentManager.get_subagents() (line 402) |
include_prompt_detail |
| 8 | Project context | Directory tree (depth/file limited, gitignore-aware) + git branch/status/log via ProjectContextProvider (line 36) |
include_project_context |
| 9 | Project docs | Trusted VIBE.md/AGENTS.md file content, up to max_doc_bytes (line 460) |
include_project_context |
Base Prompt (cli.md)
The 47-line cli.md sets the LLM's fundamental identity and constraints:
- Identity: "You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI"
- Tool usage rules: always use tools, check required params, use exact user-provided values
- Code modification rules: read before editing, minimal changes, no over-engineering, no backward-compat hacks, match existing style
- Code references:
file_path:line_numberformat - Tone: no emojis unless requested, concise CLI-style output, no praise/validation, professional objectivity
Project Context Provider
ProjectContextProvider (line 36) generates an LLM-friendly directory tree with multiple safety bounds:
- Max depth:
config.max_depth - Max files:
config.max_files - Max chars:
config.max_charswithtruncation_buffer - Timeout:
config.timeout_seconds - Gitignore-aware: loads
.gitignorepatterns plus ~30 hardcoded defaults (.git,node_modules,__pycache__, etc.)
The git status section includes current branch, main branch detection (checks for origin/master vs origin/main), porcelain status summary, and recent commit log with decoration stripping.
2. Tool Definition Formatting
File: vibe/core/llm/format.py — APIToolFormatHandler at line 58
How Tools Are Presented to the LLM
The get_available_tools() method (line 63) builds the tool list sent with each API call:
AvailableTool(
function=AvailableFunction(
name=tool_class.get_name(), # CamelCase → snake_case
description=tool_class.description, # Class-level docstring
parameters=tool_class.get_parameters() # Cleaned JSON schema
)
)Name conversion (tools/base.py:314-317): re.sub(r"(?<!^)(?=[A-Z])", "_", name).lower() converts class names like SearchReplace to search_replace.
Schema generation (tools/base.py:291-311): Pydantic model_json_schema() with cleanup:
- Strips
titlefrom top-level and all properties - Strips
titlefrom$defsentries and their nested properties - Removes top-level
description
Tool choice: Always "auto" (line 75-76).
Full Tool Call Pipeline
AvailableTool definitions
→ sent with API request
→ LLM returns tool_calls in response
→ APIToolFormatHandler.parse_message() → ParsedToolCall(s)
→ resolve_tool_calls() validates against ToolManager
→ ResolvedToolCall (success)
→ FailedToolCall (unknown tool or validation error)
Key types in the pipeline:
| Type | Fields | Purpose |
|---|---|---|
ParsedToolCall |
tool_name, raw_args, call_id |
Raw extraction from API response |
ResolvedToolCall |
tool_name, tool_class, validated_args, call_id |
Validated against Pydantic model |
FailedToolCall |
tool_name, call_id, error |
Unknown tool or validation failure |
3. Conversation Loop Pattern
File: vibe/core/agent_loop.py — see 02-agent-loop for full detail
Loop Structure
act(msg)
├── _clean_message_history() ← repair missing tool responses
└── _conversation_loop(msg)
├── append user message
├── yield UserMessageEvent
└── while not should_break_loop:
├── middleware_pipeline.run_before_turn()
│ └── handle: STOP → return, COMPACT → compact(), INJECT_MESSAGE → append
├── _perform_llm_turn()
│ ├── _chat() or _chat_streaming()
│ ├── format_handler.parse_message() → ParsedMessage
│ ├── format_handler.resolve_tool_calls() → ResolvedMessage
│ └── _handle_tool_calls() → yield events, append results
├── should_break = (last_message.role != "tool")
└── middleware_pipeline.run_after_turn()
Termination conditions:
- LLM response has no tool calls (last message is not
role: tool) - Middleware returns
STOP - User cancellation detected
Message History
- Flat
list[LLMMessage]with system message always at index 0 - History repair via
_clean_message_history()(line 730):_fill_missing_tool_responses()(line 737): inserts placeholder tool responses for any assistant tool_calls that lack corresponding tool messages_ensure_assistant_after_tools()(line 779): appends"Understood."assistant message if conversation ends with a tool message
4. Streaming Architecture
Two Code Paths
| Path | Method | Behavior |
|---|---|---|
| Non-streaming | _chat() (line 569) |
Single request → full LLMChunk → _update_stats() |
| Streaming | _chat_streaming() (line 613) |
Chunked SSE → LLMChunk.__add__ aggregation → batched UI events |
Streaming Details
_stream_assistant_events() (line 369) batches chunks for UI efficiency:
- Batch size: 5 chunks before yielding an
AssistantEventorReasoningEvent - Reasoning/content separation:
reasoning_contentfield routed toReasoningEvent, main content toAssistantEvent. The two never overlap in the same yield — when switching from reasoning to content (or vice versa), the current buffer is flushed first.
Chunk Merging (LLMMessage.__add__, types.py:217-267)
The __add__ operator on LLMMessage handles incremental assembly:
- Content: string concatenation of
contentandreasoning_content - Tool calls: merged by
indexusingOrderedDict— function names validated for consistency, arguments concatenated - Guards: raises
ValueErrorif role, name, or tool_call_id differ between chunks
LLMChunk.__add__ (line 287) combines both the message (via LLMMessage.__add__) and usage (via LLMUsage.__add__).
5. Guardrails — Middleware Pipeline
File: vibe/core/middleware.py — see 04-middleware for full detail
The MiddlewarePipeline runs a chain of middleware before each LLM call. Four actions are available:
| Action | Effect |
|---|---|
CONTINUE |
Proceed normally |
STOP |
Halt the conversation loop, yield stop event |
COMPACT |
Trigger context compaction, then continue |
INJECT_MESSAGE |
Append text to the last message (before-turn only) |
Middleware Registry
| Middleware | Trigger Condition | Action | Purpose |
|---|---|---|---|
TurnLimitMiddleware |
steps - 1 >= max_turns |
STOP | Prevent infinite loops |
PriceLimitMiddleware |
session_cost > max_price |
STOP | Budget enforcement |
AutoCompactMiddleware |
context_tokens >= threshold |
COMPACT | Prevent context overflow |
ContextWarningMiddleware |
tokens >= 50% of threshold (fires once) | INJECT_MESSAGE | User awareness |
PlanAgentMiddleware |
active agent is "plan" |
INJECT_MESSAGE | Read-only enforcement |
Key Design Rules
- INJECT_MESSAGE is forbidden in
run_after_turn()— raisesValueErrorif attempted (line 213-216). This prevents modifying the conversation after the LLM has already responded. - STOP and COMPACT short-circuit — if any middleware returns these, remaining middleware is skipped.
- Multiple INJECT_MESSAGE results are combined with
"\n\n"join (line 203). - Reset on compaction:
ContextWarningMiddleware.has_warnedresets toFalseso it can fire again after compaction.
6. Guardrails — Tool Permission System
File: vibe/core/agent_loop.py:675-728
Tool execution goes through a three-tier permission check:
_should_execute_tool(tool, args, tool_call_id)
│
├── 1. Global auto_approve? ─── yes ──→ EXECUTE
│
├── 2. tool.check_allowlist_denylist(args)
│ ├── ALWAYS ──→ EXECUTE
│ ├── NEVER ──→ SKIP (with denylist patterns in feedback)
│ └── None ──→ continue to step 3
│
├── 3. Tool config permission
│ ├── ALWAYS ──→ EXECUTE
│ ├── NEVER ──→ SKIP ("permanently disabled")
│ └── ASK ──→ continue to step 4
│
└── 4. User approval callback
├── async or sync callback
├── YES ──→ EXECUTE
└── NO ──→ SKIP
Permission Levels
| Level | Value | Behavior |
|---|---|---|
ToolPermission.ALWAYS |
"always" |
Auto-approve without user prompt |
ToolPermission.ASK |
"ask" |
Prompt user for approval (default) |
ToolPermission.NEVER |
"never" |
Always reject |
Allowlist/Denylist
The base check_allowlist_denylist() (tools/base.py:326-336) returns None by default. Tool subclasses (like Bash) override this to implement pattern matching against their arguments. The patterns use fnmatch-style globbing.
Stats Tracking
Permission outcomes update AgentStats:
- Approved:
tool_calls_agreed += 1 - Rejected:
tool_calls_rejected += 1 ToolPermissionErrorduring execution: corrects agreed back to rejected (agreed -= 1,rejected += 1)
7. Guardrails — Agent Safety Levels
File: vibe/core/agents/models.py
AgentSafety Enum
SAFE → Read-only operations only
NEUTRAL → Standard permission model
DESTRUCTIVE → Some tools auto-approved
YOLO → Everything auto-approved
Builtin Agent Profiles
| Agent | Safety | Tool Restrictions | Extra Enforcement |
|---|---|---|---|
default |
NEUTRAL | All tools available, each needs approval | None |
plan |
SAFE | Only grep, read_file, todo, ask_user_question, task |
PlanAgentMiddleware injects read-only reminder every turn |
accept-edits |
DESTRUCTIVE | write_file and search_replace set to always |
None |
auto-approve |
YOLO | All tools, auto_approve: true |
None |
explore |
SAFE | Only grep, read_file |
Subagent type (cannot be selected by user directly) |
Plan Agent — Dual Enforcement
The plan agent is the most restricted and uses two complementary mechanisms:
- Tool enablelist (
overrides.enabled_tools): only 5 read-only tools are available - Middleware prompt injection (
PlanAgentMiddleware, line 152): every turn, a warning is appended to the last message reminding the LLM it "MUST NOT make any edits, run any non-readonly tools, or otherwise make any changes to the system"
This belt-and-suspenders approach means even if the LLM ignores the prompt injection, the restricted tool list prevents destructive actions.
8. Context Management
Token Tracking
AgentStats (types.py:26) tracks tokens at multiple granularities:
| Field | Updated | Purpose |
|---|---|---|
context_tokens |
Each LLM call | Current context window usage (prompt + completion) |
session_prompt_tokens |
Cumulative | Total input tokens across session |
session_completion_tokens |
Cumulative | Total output tokens across session |
session_cost |
Computed property | (prompt_tokens / 1M) * input_price + (completion_tokens / 1M) * output_price |
Token Counting
Both backends implement count_tokens() using a "minimal completion" trick:
- MistralBackend (
mistral.py:356-378): callscomplete()withmax_tokens=1, reads backusage.prompt_tokens - GenericBackend (
generic.py:418-444): callscomplete()withmax_tokens=16(minimum for OpenRouter compatibility), reads backusage.prompt_tokens
Compaction Process
When AutoCompactMiddleware triggers (context_tokens >= threshold):
compact() [agent_loop.py:824]
├── _clean_message_history()
├── save current session
├── append compact.md template as user message
├── _chat() → LLM generates summary
├── replace history with [system_message, summary_message]
├── count_tokens() on new minimal history
├── reset session ID (new affinity)
└── middleware_pipeline.reset(COMPACT)
The compact.md template (vibe/core/prompts/compact.md) requests a 7-section summary:
- User's primary goals and intent
- Conversation timeline and progress
- Technical context and decisions
- Files and code changes
- Active work and last actions
- Unresolved issues and pending tasks
- Immediate next step
Session Affinity
An x-affinity header set to session_id is sent with every LLM call (agent_loop.py:587,634). This enables server-side routing optimizations (e.g., prompt caching). The session ID is reset on compaction (_reset_session(), line 789) since the conversation history changes substantially.
9. Prompt Engineering Techniques in Tool Prompts
Each tool can ship a .md file in a prompts/ subdirectory (discovered via BaseTool.get_tool_prompt(), tools/base.py:130-149). These prompts shape how the LLM uses tools. Key techniques:
Anti-Pattern Tables
bash.md includes a "DO NOT USE" section mapping bash commands to proper tool equivalents:
cat filename → Use read_file(path="filename")
grep -r "pattern" . → Use grep(pattern="pattern", path=".")
sed -i 's/old/new/' → Use search_replace tool
This prevents the LLM from falling back to shell commands when dedicated tools exist.
RIGHT/WRONG Examples
Tool prompts use labeled code blocks showing correct vs incorrect usage:
WRONG:
bash("cat large_file.txt")
RIGHT:
read_file(path="large_file.txt", limit=1000)
Structured Output Guidance
ask_user_question.md specifies JSON structure with precise constraints (max 12-character headers, 2-4 options). search_replace.md defines exact block syntax with separator requirements.
Delegation Guidance
task.md defines when the LLM should delegate to subagents vs use tools directly, with criteria for choosing between agent types.
Tool-Specific Behavioral Rules
- bash.md: stateless execution model, timeout guidance, appropriate vs inappropriate use cases
- write_file.md: project directory boundary enforcement
- grep.md: regex pattern guidance, path scoping
10. @File Reference Expansion
File: vibe/core/autocompletion/path_prompt.py
The @ prefix allows users to reference files in their messages. The system expands these into structured context.
Resolution Pipeline
build_path_prompt_payload(message, base_dir)
│
├── scan for @ anchors
│ └── _is_path_anchor(): @ not preceded by alphanumeric or underscore
│
├── extract candidate path
│ ├── quoted: @"path/to/file" or @'path/to/file'
│ └── unquoted: @path/to/file (stops at non-path chars)
│
├── resolve to PathResource
│ ├── absolute paths used directly
│ ├── relative paths resolved against base_dir
│ └── must exist on disk
│
└── PathPromptPayload(display_text, prompt_text, resources)
Path characters (_is_path_char, line 79): alphanumeric plus ._/\-()[]{}.
Deduplication: multiple references to the same resolved path produce a single PathResource.
Resource types: "file" or "directory", determined by Path.is_dir().
11. Error Handling Around LLM Calls
Error Classification
File: vibe/core/llm/exceptions.py
BackendError (line 28) classifies errors by HTTP status:
| Status | Behavior |
|---|---|
| 401 Unauthorized | "Invalid API key" message |
| 429 Too Many Requests | "Rate limit exceeded" message |
| Other | Full diagnostic with status, request_id, endpoint, model, body excerpt |
PayloadSummary
Every BackendError includes a PayloadSummary (line 19) for debugging:
model: model namemessage_count: number of messages in the requestapprox_chars: total character count across messagestemperature: sampling temperaturehas_tools: whether tools were includedtool_choice: tool selection mode
Error Builder Pattern
BackendErrorBuilder (line 107) provides two factory methods:
build_http_error(): forhttpx.HTTPStatusError— extracts status, headers, body, parses provider error message from JSONbuild_request_error(): forhttpx.RequestError— network-level failures with "Network error" label
Rate Limit Handling
_should_raise_rate_limit_error() (agent_loop.py:101-102) checks for 429 status and converts to RateLimitError (types.py:387) with provider and model info.
Retry Logic
Both backend implementations use retry decorators from vibe/core/utils.py:
@async_retry(tries=3)onGenericBackend._make_request()(line 376)@async_generator_retry(tries=3)onGenericBackend._make_streaming_request()(line 388)
Tool Execution Errors
Tool failures produce <tool_error> tagged messages:
f"<{TOOL_ERROR_TAG}>{tool_instance.get_name()} failed: {exc}</{TOOL_ERROR_TAG}>"ToolPermissionError gets special stats treatment: agreed is decremented and rejected incremented to accurately reflect that the tool was approved but then blocked at a deeper permission layer.
12. Multi-Provider Support
Architecture
The adapter pattern separates protocol concerns from transport:
AgentLoop
│
├── BackendLike protocol (vibe/core/llm/types.py:13)
│ ├── complete()
│ ├── complete_streaming()
│ └── count_tokens()
│
├── MistralBackend (backend/mistral.py:152)
│ └── uses native mistralai SDK
│
└── GenericBackend (backend/generic.py:197)
└── uses httpx + APIAdapter protocol
└── OpenAIAdapter (generic.py:71) — OpenAI-compatible API format
Backend Factory
BACKEND_FACTORY (backend/factory.py:7) maps Backend enum values to backend classes:
{Backend.MISTRAL: MistralBackend, Backend.GENERIC: GenericBackend}Selection happens in AgentLoop._select_backend() (line 194) based on the provider config for the active model.
GenericBackend + APIAdapter
GenericBackend handles HTTP transport with httpx.AsyncClient:
- Connection pooling: 5 keepalive, 10 max connections (
httpx.Limits) - Timeout: configurable, default 720s
- Adapter protocol (
APIAdapter):prepare_request()builds endpoint/headers/body,parse_response()converts JSON toLLMChunk - Reasoning field mapping: the
OpenAIAdaptermaps between internalreasoning_contentand provider-specific field names (e.g.,thinkingfor some providers) via_reasoning_to_api()/_reasoning_from_api()
MistralBackend
Uses the native mistralai SDK directly:
- ThinkChunk mapping:
MistralMapper.parse_content()(line 116) extracts reasoning fromThinkChunkobjects and maps to the internalreasoning_contentfield - URL parsing: strips API version path from base URL (SDK takes server URL only)
- Token counting:
max_tokens=1trick (line 371) - No custom reasoning field: raises
ValueErrorifreasoning_field_nameis not"reasoning_content"— Mistral usesThinkChunknatively
Streaming Differences
| Aspect | GenericBackend | MistralBackend |
|---|---|---|
| Protocol | SSE over httpx | mistralai.chat.stream_async() |
| Parsing | Manual data: line parsing |
SDK handles framing |
| Usage | Per-chunk via stream_options.include_usage |
Per-chunk via chunk.data.usage |
| Tool streaming | stream_options.stream_tool_calls (Mistral provider only) |
SDK native |
Cross-References
- Agent loop internals: 02-agent-loop.md
- Tool system architecture: 03-tool-system.md
- Middleware details: 04-middleware.md
- Backend implementation: 05-llm-backends.md
- Agent profiles: 08-agents-skills.md