LLM Interaction Patterns, Guardrails & Techniques

This document describes how Mistral Vibe shapes LLM behavior: how prompts are assembled, how tools are presented, how the conversation loop operates, and what guardrails constrain the LLM at every level. It provides the cross-cutting "LLM interaction design" perspective that complements the per-component docs (02-agent-loop, 04-middleware, 05-llm-backends).

1. System Prompt Assembly

Entry point: get_universal_system_prompt() at vibe/core/system_prompt.py:415-466

The system prompt is built by concatenating independent sections, joined by "\n\n". Each section is gated by a config flag, allowing agent profiles to strip the prompt down.

Assembly Flow

get_universal_system_prompt()
    │
    ├── config.system_prompt          ← base prompt (cli.md, 47 lines)
    │
    ├── [if include_commit_signature]
    │   └── _add_commit_signature()   ← git commit heredoc template (line 361)
    │
    ├── [if include_model_info]
    │   └── f"Your model name is: `{config.active_model}`"
    │
    ├── [if include_prompt_detail]
    │   ├── _get_os_system_prompt()   ← platform + shell detection (line 339)
    │   ├── tool prompts              ← per-tool .md files, joined by "\n---\n"
    │   ├── skills catalog            ← XML <available_skills> block (line 374)
    │   └── subagents catalog         ← markdown list (line 402)
    │
    ├── [if include_project_context]
    │   ├── ProjectContextProvider    ← directory tree + git status (line 36)
    │   └── _load_project_doc()       ← VIBE.md / AGENTS.md content (line 24)
    │
    └── "\n\n".join(sections)

Section Details

#	Section	Source	Config Gate
1	Base prompt	`cli.md` — behavioral rules, tool usage, code modification style, tone	`config.system_prompt` (always present)
2	Commit signature	`_add_commit_signature()` (line 361) — heredoc template for git commits	`include_commit_signature`
3	Model identity	Active model name string injection (line 427)	`include_model_info`
4	OS/shell info	`_get_os_system_prompt()` (line 339) — platform name + shell path, Windows-specific command rules	`include_prompt_detail`
5	Tool prompts	Per-tool `.md` files loaded via `BaseTool.get_tool_prompt()` (`tools/base.py:130-149`), joined by `\n---\n`	`include_prompt_detail`
6	Skills catalog	XML `<available_skills>` with HTML-escaped name, description, path per skill (line 374)	`include_prompt_detail`
7	Subagents catalog	Markdown bullet list from `AgentManager.get_subagents()` (line 402)	`include_prompt_detail`
8	Project context	Directory tree (depth/file limited, gitignore-aware) + git branch/status/log via `ProjectContextProvider` (line 36)	`include_project_context`
9	Project docs	Trusted `VIBE.md`/`AGENTS.md` file content, up to `max_doc_bytes` (line 460)	`include_project_context`

Base Prompt (`cli.md`)

The 47-line cli.md sets the LLM's fundamental identity and constraints:

Identity: "You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI"
Tool usage rules: always use tools, check required params, use exact user-provided values
Code modification rules: read before editing, minimal changes, no over-engineering, no backward-compat hacks, match existing style
Code references: file_path:line_number format
Tone: no emojis unless requested, concise CLI-style output, no praise/validation, professional objectivity

Project Context Provider

ProjectContextProvider (line 36) generates an LLM-friendly directory tree with multiple safety bounds:

Max depth: config.max_depth
Max files: config.max_files
Max chars: config.max_chars with truncation_buffer
Timeout: config.timeout_seconds
Gitignore-aware: loads .gitignore patterns plus ~30 hardcoded defaults (.git, node_modules, __pycache__, etc.)

The git status section includes current branch, main branch detection (checks for origin/master vs origin/main), porcelain status summary, and recent commit log with decoration stripping.

2. Tool Definition Formatting

File: vibe/core/llm/format.py — APIToolFormatHandler at line 58

How Tools Are Presented to the LLM

The get_available_tools() method (line 63) builds the tool list sent with each API call:

AvailableTool(
    function=AvailableFunction(
        name=tool_class.get_name(),          # CamelCase → snake_case
        description=tool_class.description,   # Class-level docstring
        parameters=tool_class.get_parameters() # Cleaned JSON schema
    )
)

Name conversion (tools/base.py:314-317): re.sub(r"(?<!^)(?=[A-Z])", "_", name).lower() converts class names like SearchReplace to search_replace.

Schema generation (tools/base.py:291-311): Pydantic model_json_schema() with cleanup:

Strips title from top-level and all properties
Strips title from $defs entries and their nested properties
Removes top-level description

Tool choice: Always "auto" (line 75-76).

Full Tool Call Pipeline

AvailableTool definitions
    → sent with API request
        → LLM returns tool_calls in response
            → APIToolFormatHandler.parse_message() → ParsedToolCall(s)
                → resolve_tool_calls() validates against ToolManager
                    → ResolvedToolCall (success)
                    → FailedToolCall (unknown tool or validation error)

Key types in the pipeline:

Type	Fields	Purpose
`ParsedToolCall`	`tool_name`, `raw_args`, `call_id`	Raw extraction from API response
`ResolvedToolCall`	`tool_name`, `tool_class`, `validated_args`, `call_id`	Validated against Pydantic model
`FailedToolCall`	`tool_name`, `call_id`, `error`	Unknown tool or validation failure

3. Conversation Loop Pattern

File: vibe/core/agent_loop.py — see 02-agent-loop for full detail

Loop Structure

act(msg)
  ├── _clean_message_history()      ← repair missing tool responses
  └── _conversation_loop(msg)
        ├── append user message
        ├── yield UserMessageEvent
        └── while not should_break_loop:
              ├── middleware_pipeline.run_before_turn()
              │     └── handle: STOP → return, COMPACT → compact(), INJECT_MESSAGE → append
              ├── _perform_llm_turn()
              │     ├── _chat() or _chat_streaming()
              │     ├── format_handler.parse_message() → ParsedMessage
              │     ├── format_handler.resolve_tool_calls() → ResolvedMessage
              │     └── _handle_tool_calls() → yield events, append results
              ├── should_break = (last_message.role != "tool")
              └── middleware_pipeline.run_after_turn()

Termination conditions:

LLM response has no tool calls (last message is not role: tool)
Middleware returns STOP
User cancellation detected

Message History

Flat list[LLMMessage] with system message always at index 0
History repair via _clean_message_history() (line 730):
- _fill_missing_tool_responses() (line 737): inserts placeholder tool responses for any assistant tool_calls that lack corresponding tool messages
- _ensure_assistant_after_tools() (line 779): appends "Understood." assistant message if conversation ends with a tool message

4. Streaming Architecture

Two Code Paths

Path	Method	Behavior
Non-streaming	`_chat()` (line 569)	Single request → full `LLMChunk` → `_update_stats()`
Streaming	`_chat_streaming()` (line 613)	Chunked SSE → `LLMChunk.__add__` aggregation → batched UI events

Streaming Details

_stream_assistant_events() (line 369) batches chunks for UI efficiency:

Batch size: 5 chunks before yielding an AssistantEvent or ReasoningEvent
Reasoning/content separation: reasoning_content field routed to ReasoningEvent, main content to AssistantEvent. The two never overlap in the same yield — when switching from reasoning to content (or vice versa), the current buffer is flushed first.

Chunk Merging (`LLMMessage.add`, `types.py:217-267`)

The __add__ operator on LLMMessage handles incremental assembly:

Content: string concatenation of content and reasoning_content
Tool calls: merged by index using OrderedDict — function names validated for consistency, arguments concatenated
Guards: raises ValueError if role, name, or tool_call_id differ between chunks

LLMChunk.__add__ (line 287) combines both the message (via LLMMessage.__add__) and usage (via LLMUsage.__add__).

5. Guardrails — Middleware Pipeline

File: vibe/core/middleware.py — see 04-middleware for full detail

The MiddlewarePipeline runs a chain of middleware before each LLM call. Four actions are available:

Action	Effect
`CONTINUE`	Proceed normally
`STOP`	Halt the conversation loop, yield stop event
`COMPACT`	Trigger context compaction, then continue
`INJECT_MESSAGE`	Append text to the last message (before-turn only)

Middleware Registry

Middleware	Trigger Condition	Action	Purpose
`TurnLimitMiddleware`	`steps - 1 >= max_turns`	STOP	Prevent infinite loops
`PriceLimitMiddleware`	`session_cost > max_price`	STOP	Budget enforcement
`AutoCompactMiddleware`	`context_tokens >= threshold`	COMPACT	Prevent context overflow
`ContextWarningMiddleware`	tokens >= 50% of threshold (fires once)	INJECT_MESSAGE	User awareness
`PlanAgentMiddleware`	active agent is `"plan"`	INJECT_MESSAGE	Read-only enforcement

Key Design Rules

INJECT_MESSAGE is forbidden in run_after_turn() — raises ValueError if attempted (line 213-216). This prevents modifying the conversation after the LLM has already responded.
STOP and COMPACT short-circuit — if any middleware returns these, remaining middleware is skipped.
Multiple INJECT_MESSAGE results are combined with "\n\n" join (line 203).
Reset on compaction: ContextWarningMiddleware.has_warned resets to False so it can fire again after compaction.

6. Guardrails — Tool Permission System

File: vibe/core/agent_loop.py:675-728

Tool execution goes through a three-tier permission check:

_should_execute_tool(tool, args, tool_call_id)
    │
    ├── 1. Global auto_approve?  ─── yes ──→ EXECUTE
    │
    ├── 2. tool.check_allowlist_denylist(args)
    │       ├── ALWAYS ──→ EXECUTE
    │       ├── NEVER  ──→ SKIP (with denylist patterns in feedback)
    │       └── None   ──→ continue to step 3
    │
    ├── 3. Tool config permission
    │       ├── ALWAYS ──→ EXECUTE
    │       ├── NEVER  ──→ SKIP ("permanently disabled")
    │       └── ASK    ──→ continue to step 4
    │
    └── 4. User approval callback
            ├── async or sync callback
            ├── YES ──→ EXECUTE
            └── NO  ──→ SKIP

Permission Levels

Level	Value	Behavior
`ToolPermission.ALWAYS`	`"always"`	Auto-approve without user prompt
`ToolPermission.ASK`	`"ask"`	Prompt user for approval (default)
`ToolPermission.NEVER`	`"never"`	Always reject

Allowlist/Denylist

The base check_allowlist_denylist() (tools/base.py:326-336) returns None by default. Tool subclasses (like Bash) override this to implement pattern matching against their arguments. The patterns use fnmatch-style globbing.

Stats Tracking

Permission outcomes update AgentStats:

Approved: tool_calls_agreed += 1
Rejected: tool_calls_rejected += 1
ToolPermissionError during execution: corrects agreed back to rejected (agreed -= 1, rejected += 1)

7. Guardrails — Agent Safety Levels

File: vibe/core/agents/models.py

AgentSafety Enum

SAFE        → Read-only operations only
NEUTRAL     → Standard permission model
DESTRUCTIVE → Some tools auto-approved
YOLO        → Everything auto-approved

Builtin Agent Profiles

Agent	Safety	Tool Restrictions	Extra Enforcement
`default`	NEUTRAL	All tools available, each needs approval	None
`plan`	SAFE	Only `grep`, `read_file`, `todo`, `ask_user_question`, `task`	`PlanAgentMiddleware` injects read-only reminder every turn
`accept-edits`	DESTRUCTIVE	`write_file` and `search_replace` set to `always`	None
`auto-approve`	YOLO	All tools, `auto_approve: true`	None
`explore`	SAFE	Only `grep`, `read_file`	Subagent type (cannot be selected by user directly)

Plan Agent — Dual Enforcement

The plan agent is the most restricted and uses two complementary mechanisms:

Tool enablelist (overrides.enabled_tools): only 5 read-only tools are available
Middleware prompt injection (PlanAgentMiddleware, line 152): every turn, a warning is appended to the last message reminding the LLM it "MUST NOT make any edits, run any non-readonly tools, or otherwise make any changes to the system"

This belt-and-suspenders approach means even if the LLM ignores the prompt injection, the restricted tool list prevents destructive actions.

8. Context Management

Token Tracking

AgentStats (types.py:26) tracks tokens at multiple granularities:

Field	Updated	Purpose
`context_tokens`	Each LLM call	Current context window usage (prompt + completion)
`session_prompt_tokens`	Cumulative	Total input tokens across session
`session_completion_tokens`	Cumulative	Total output tokens across session
`session_cost`	Computed property	`(prompt_tokens / 1M) * input_price + (completion_tokens / 1M) * output_price`

Token Counting

Both backends implement count_tokens() using a "minimal completion" trick:

MistralBackend (mistral.py:356-378): calls complete() with max_tokens=1, reads back usage.prompt_tokens
GenericBackend (generic.py:418-444): calls complete() with max_tokens=16 (minimum for OpenRouter compatibility), reads back usage.prompt_tokens

Compaction Process

When AutoCompactMiddleware triggers (context_tokens >= threshold):

compact() [agent_loop.py:824]
    ├── _clean_message_history()
    ├── save current session
    ├── append compact.md template as user message
    ├── _chat() → LLM generates summary
    ├── replace history with [system_message, summary_message]
    ├── count_tokens() on new minimal history
    ├── reset session ID (new affinity)
    └── middleware_pipeline.reset(COMPACT)

The compact.md template (vibe/core/prompts/compact.md) requests a 7-section summary:

User's primary goals and intent
Conversation timeline and progress
Technical context and decisions
Files and code changes
Active work and last actions
Unresolved issues and pending tasks
Immediate next step

Session Affinity

An x-affinity header set to session_id is sent with every LLM call (agent_loop.py:587,634). This enables server-side routing optimizations (e.g., prompt caching). The session ID is reset on compaction (_reset_session(), line 789) since the conversation history changes substantially.

9. Prompt Engineering Techniques in Tool Prompts

Each tool can ship a .md file in a prompts/ subdirectory (discovered via BaseTool.get_tool_prompt(), tools/base.py:130-149). These prompts shape how the LLM uses tools. Key techniques:

Anti-Pattern Tables

bash.md includes a "DO NOT USE" section mapping bash commands to proper tool equivalents:

cat filename        → Use read_file(path="filename")
grep -r "pattern" . → Use grep(pattern="pattern", path=".")
sed -i 's/old/new/' → Use search_replace tool

This prevents the LLM from falling back to shell commands when dedicated tools exist.

RIGHT/WRONG Examples

Tool prompts use labeled code blocks showing correct vs incorrect usage:

WRONG:
  bash("cat large_file.txt")

RIGHT:
  read_file(path="large_file.txt", limit=1000)

Structured Output Guidance

ask_user_question.md specifies JSON structure with precise constraints (max 12-character headers, 2-4 options). search_replace.md defines exact block syntax with separator requirements.

Delegation Guidance

task.md defines when the LLM should delegate to subagents vs use tools directly, with criteria for choosing between agent types.

Tool-Specific Behavioral Rules

bash.md: stateless execution model, timeout guidance, appropriate vs inappropriate use cases
write_file.md: project directory boundary enforcement
grep.md: regex pattern guidance, path scoping

10. @File Reference Expansion

File: vibe/core/autocompletion/path_prompt.py

The @ prefix allows users to reference files in their messages. The system expands these into structured context.

Resolution Pipeline

build_path_prompt_payload(message, base_dir)
    │
    ├── scan for @ anchors
    │   └── _is_path_anchor(): @ not preceded by alphanumeric or underscore
    │
    ├── extract candidate path
    │   ├── quoted: @"path/to/file" or @'path/to/file'
    │   └── unquoted: @path/to/file (stops at non-path chars)
    │
    ├── resolve to PathResource
    │   ├── absolute paths used directly
    │   ├── relative paths resolved against base_dir
    │   └── must exist on disk
    │
    └── PathPromptPayload(display_text, prompt_text, resources)

Path characters (_is_path_char, line 79): alphanumeric plus ._/\-()[]{}.

Deduplication: multiple references to the same resolved path produce a single PathResource.

Resource types: "file" or "directory", determined by Path.is_dir().

11. Error Handling Around LLM Calls

Error Classification

File: vibe/core/llm/exceptions.py

BackendError (line 28) classifies errors by HTTP status:

Status	Behavior
401 Unauthorized	"Invalid API key" message
429 Too Many Requests	"Rate limit exceeded" message
Other	Full diagnostic with status, request_id, endpoint, model, body excerpt

PayloadSummary

Every BackendError includes a PayloadSummary (line 19) for debugging:

model: model name
message_count: number of messages in the request
approx_chars: total character count across messages
temperature: sampling temperature
has_tools: whether tools were included
tool_choice: tool selection mode

Error Builder Pattern

BackendErrorBuilder (line 107) provides two factory methods:

build_http_error(): for httpx.HTTPStatusError — extracts status, headers, body, parses provider error message from JSON
build_request_error(): for httpx.RequestError — network-level failures with "Network error" label

Rate Limit Handling

_should_raise_rate_limit_error() (agent_loop.py:101-102) checks for 429 status and converts to RateLimitError (types.py:387) with provider and model info.

Retry Logic

Both backend implementations use retry decorators from vibe/core/utils.py:

@async_retry(tries=3) on GenericBackend._make_request() (line 376)
@async_generator_retry(tries=3) on GenericBackend._make_streaming_request() (line 388)

Tool Execution Errors

Tool failures produce <tool_error> tagged messages:

f"<{TOOL_ERROR_TAG}>{tool_instance.get_name()} failed: {exc}</{TOOL_ERROR_TAG}>"

ToolPermissionError gets special stats treatment: agreed is decremented and rejected incremented to accurately reflect that the tool was approved but then blocked at a deeper permission layer.

12. Multi-Provider Support

Architecture

The adapter pattern separates protocol concerns from transport:

AgentLoop
    │
    ├── BackendLike protocol (vibe/core/llm/types.py:13)
    │   ├── complete()
    │   ├── complete_streaming()
    │   └── count_tokens()
    │
    ├── MistralBackend (backend/mistral.py:152)
    │   └── uses native mistralai SDK
    │
    └── GenericBackend (backend/generic.py:197)
        └── uses httpx + APIAdapter protocol
            └── OpenAIAdapter (generic.py:71) — OpenAI-compatible API format

Backend Factory

BACKEND_FACTORY (backend/factory.py:7) maps Backend enum values to backend classes:

{Backend.MISTRAL: MistralBackend, Backend.GENERIC: GenericBackend}

Selection happens in AgentLoop._select_backend() (line 194) based on the provider config for the active model.

GenericBackend + APIAdapter

GenericBackend handles HTTP transport with httpx.AsyncClient:

Connection pooling: 5 keepalive, 10 max connections (httpx.Limits)
Timeout: configurable, default 720s
Adapter protocol (APIAdapter): prepare_request() builds endpoint/headers/body, parse_response() converts JSON to LLMChunk
Reasoning field mapping: the OpenAIAdapter maps between internal reasoning_content and provider-specific field names (e.g., thinking for some providers) via _reasoning_to_api()/_reasoning_from_api()

MistralBackend

Uses the native mistralai SDK directly:

ThinkChunk mapping: MistralMapper.parse_content() (line 116) extracts reasoning from ThinkChunk objects and maps to the internal reasoning_content field
URL parsing: strips API version path from base URL (SDK takes server URL only)
Token counting: max_tokens=1 trick (line 371)
No custom reasoning field: raises ValueError if reasoning_field_name is not "reasoning_content" — Mistral uses ThinkChunk natively

Streaming Differences

Aspect	GenericBackend	MistralBackend
Protocol	SSE over httpx	`mistralai.chat.stream_async()`
Parsing	Manual `data:` line parsing	SDK handles framing
Usage	Per-chunk via `stream_options.include_usage`	Per-chunk via `chunk.data.usage`
Tool streaming	`stream_options.stream_tool_calls` (Mistral provider only)	SDK native

Cross-References

Agent loop internals: 02-agent-loop.md
Tool system architecture: 03-tool-system.md
Middleware details: 04-middleware.md
Backend implementation: 05-llm-backends.md
Agent profiles: 08-agents-skills.md