LLM Backend Abstraction

The LLM backend system provides a pluggable abstraction for connecting to different LLM providers while maintaining a consistent interface for the Agent.

Overview

Directory: vibe/core/llm/

vibe/core/llm/
├── backend/
│   ├── factory.py      # Backend factory mapping
│   ├── mistral.py      # Native Mistral SDK backend
│   └── generic.py      # OpenAI-compatible backend
├── types.py            # BackendLike protocol
├── format.py           # Message/tool formatting
└── exceptions.py       # Backend exceptions

BackendLike Protocol

File: vibe/core/llm/types.py Protocol: at line 13

# types.py:13-120
class BackendLike(Protocol):
    """Port protocol for dependency-injectable LLM backends.
 
    Any backend used by Agent should implement this async context manager
    interface with `complete`, `complete_streaming` and `count_tokens` methods.
    """
 
    async def __aenter__(self) -> BackendLike: ...
    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: ...
 
    async def complete(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float,
        tools: list[AvailableTool] | None,
        max_tokens: int | None,
        tool_choice: StrToolChoice | AvailableTool | None,
        extra_headers: dict[str, str] | None,
    ) -> LLMChunk: ...
 
    def complete_streaming(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float,
        tools: list[AvailableTool] | None,
        max_tokens: int | None,
        tool_choice: StrToolChoice | AvailableTool | None,
        extra_headers: dict[str, str] | None,
    ) -> AsyncGenerator[LLMChunk, None]: ...
 
    async def count_tokens(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float = 0.0,
        tools: list[AvailableTool] | None,
        tool_choice: StrToolChoice | AvailableTool | None = None,
        extra_headers: dict[str, str] | None,
    ) -> int: ...

Method Purposes

Method	Purpose
`complete()`	Non-streaming completion, returns single `LLMChunk`
`complete_streaming()`	Streaming completion, yields `LLMChunk` objects
`count_tokens()`	Count prompt tokens without generating response

Backend Factory

File: vibe/core/llm/backend/factory.py

# factory.py:7
BACKEND_FACTORY = {
    Backend.MISTRAL: MistralBackend,
    Backend.GENERIC: GenericBackend
}

Backend Selection

Method: _select_backend() at agent_loop.py:141

# agent_loop.py:141-145
def _select_backend(self) -> BackendLike:
    active_model = self.config.get_active_model()
    provider = self.config.get_provider_for_model(active_model)
    timeout = self.config.api_timeout
    return BACKEND_FACTORY[provider.backend](provider=provider, timeout=timeout)

Mistral Backend

File: vibe/core/llm/backend/mistral.py Class: MistralBackend at line 114

Uses the native Mistral AI SDK (mistralai package).

Constructor

# mistral.py:115-134
def __init__(self, provider: ProviderConfig, timeout: float = 720.0) -> None:
    self._client: mistralai.Mistral | None = None
    self._provider = provider
    self._mapper = MistralMapper()
    self._api_key = os.getenv(provider.api_key_env_var) if provider.api_key_env_var else None
    # Parse API base URL
    self._server_url = extract_server_url(provider.api_base)
    self._timeout = timeout

Async Context Manager

# mistral.py:136-150
async def __aenter__(self) -> MistralBackend:
    self._client = mistralai.Mistral(
        api_key=self._api_key,
        server_url=self._server_url,
        timeout_ms=int(self._timeout * 1000),
    )
    await self._client.__aenter__()
    return self
 
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
    if self._client:
        await self._client.__aexit__(exc_type, exc_val, exc_tb)

Message Mapping

Class: MistralMapper at mistral.py:30

Converts between internal LLMMessage and Mistral SDK types:

# mistral.py:31-60
def prepare_message(self, msg: LLMMessage) -> mistralai.Messages:
    match msg.role:
        case Role.system:
            return mistralai.SystemMessage(role="system", content=msg.content or "")
        case Role.user:
            return mistralai.UserMessage(role="user", content=msg.content)
        case Role.assistant:
            return mistralai.AssistantMessage(
                role="assistant",
                content=msg.content,
                tool_calls=[...],  # Convert tool calls
            )
        case Role.tool:
            return mistralai.ToolMessage(
                role="tool",
                content=msg.content,
                tool_call_id=msg.tool_call_id,
                name=msg.name,
            )

Generic Backend

File: vibe/core/llm/backend/generic.py Class: GenericBackend

OpenAI-compatible backend using httpx for HTTP requests.

Works with:

Local models (llama.cpp, Ollama)
OpenAI API
Other OpenAI-compatible APIs

Key Differences from Mistral

Aspect	Mistral Backend	Generic Backend
SDK	Native mistralai SDK	Raw httpx HTTP client
Authentication	SDK handles auth	Manual header injection
Message format	SDK types	JSON dictionaries
Error handling	SDK exceptions	HTTP status codes

Message Types

File: vibe/core/types.py

LLMMessage

# types.py:176-210
class LLMMessage(BaseModel):
    role: Role                              # system, user, assistant, tool
    content: Content | None = None          # Message text
    tool_calls: list[ToolCall] | None = None  # For assistant messages
    name: str | None = None                 # Tool name (tool messages)
    tool_call_id: str | None = None         # Tool call reference
    reasoning_content: str | None = None    # Reasoning/thinking tokens (v2.0+)
    message_id: str | None = None           # Auto-generated UUID for non-tool messages (v2.0+)

The reasoning_content field stores reasoning/thinking tokens from models that support them. During streaming, reasoning content is accumulated via LLMMessage.__add__() and yields ReasoningEvent objects.

The message_id field is auto-assigned (UUID) for non-tool messages via the _from_any() validator.

LLMChunk

# types.py:207-211
class LLMChunk(BaseModel):
    message: LLMMessage
    finish_reason: str | None = None
    usage: LLMUsage | None = None

LLMUsage

# types.py:201-204
class LLMUsage(BaseModel):
    prompt_tokens: int = 0
    completion_tokens: int = 0

Tool Format Handling

File: vibe/core/llm/format.py Class: APIToolFormatHandler

Available Tools Formatting

get_available_tools(tool_manager, config)
    │
    ├─► Get all available tool classes
    │
    ├─► Filter by enabled_tools/disabled_tools patterns
    │
    └─► Convert to AvailableTool format:
        AvailableTool(
            type="function",
            function=AvailableFunction(
                name=tool.get_name(),
                description=tool.description,
                parameters=tool.get_parameters()
            )
        )

Tool Call Resolution

resolve_tool_calls(parsed_message, tool_manager, config)
    │
    ├─► For each tool call in message:
    │   │
    │   ├── Parse JSON arguments
    │   │
    │   ├── Look up tool class
    │   │
    │   ├── Validate arguments against schema
    │   │
    │   └── Create ResolvedToolCall or FailedToolCall
    │
    └─► Return ResolvedMessage with lists of resolved and failed calls

Provider Configuration

File: vibe/core/config.py

ProviderConfig

# config.py:151-156
class ProviderConfig(BaseModel):
    name: str                               # Provider identifier
    api_base: str                           # API base URL
    api_key_env_var: str = ""              # Environment variable for API key
    api_style: str = "openai"              # API style
    backend: Backend = Backend.GENERIC      # Backend type

ModelConfig

# config.py:241-247
class ModelConfig(BaseModel):
    name: str                    # Model name sent to API
    provider: str                # Provider name reference
    alias: str                   # User-friendly alias
    temperature: float = 0.2    # Default temperature
    input_price: float = 0.0    # Price per million input tokens
    output_price: float = 0.0   # Price per million output tokens

Default Providers

# config.py:258-270
DEFAULT_PROVIDERS = [
    ProviderConfig(
        name="mistral",
        api_base="https://api.mistral.ai/v1",
        api_key_env_var="MISTRAL_API_KEY",
        backend=Backend.MISTRAL,
    ),
    ProviderConfig(
        name="llamacpp",
        api_base="http://127.0.0.1:8080/v1",
        api_key_env_var="",
    ),
]

Backend Usage in Agent

Non-Streaming Call

# agent_loop.py:608-662
async def _chat(self, max_tokens: int | None = None) -> LLMChunk:
    active_model = self.config.get_active_model()
    provider = self.config.get_provider_for_model(active_model)
    available_tools = self.format_handler.get_available_tools(...)
    tool_choice = self.format_handler.get_tool_choice()
 
    async with self.backend as backend:
        result = await backend.complete(
            model=active_model,
            messages=self.messages,
            temperature=active_model.temperature,
            tools=available_tools,
            tool_choice=tool_choice,
            extra_headers={
                "user-agent": get_user_agent(provider.backend),
                "x-affinity": self.session_id,
            },
            max_tokens=max_tokens,
        )
 
    # Update stats
    self.stats.last_turn_duration = ...
    self.stats.session_prompt_tokens += result.usage.prompt_tokens
    ...
 
    return processed_chunk

Streaming Call

# agent_loop.py:664-720
async def _chat_streaming(self, max_tokens: int | None = None) -> AsyncGenerator[LLMChunk]:
    ...
    async with self.backend as backend:
        async for chunk in backend.complete_streaming(
            model=active_model,
            messages=self.messages,
            temperature=active_model.temperature,
            tools=available_tools,
            tool_choice=tool_choice,
            extra_headers={...},
            max_tokens=max_tokens,
        ):
            yield processed_chunk
    ...

Error Handling

File: vibe/core/llm/exceptions.py

Backend errors are wrapped with context about the provider and model:

# agent_loop.py:659-662
except Exception as e:
    raise RuntimeError(
        f"API error from {provider.name} (model: {active_model.name}): {e}"
    ) from e

Adding a New Backend

Create backend class implementing BackendLike protocol
Implement async context manager (__aenter__, __aexit__)
Implement complete(), complete_streaming(), count_tokens()
Add to BACKEND_FACTORY in factory.py
Add corresponding Backend enum value in config.py

Source File References

File	Key Lines	Description
`llm/types.py:13-120`	`BackendLike`	Backend protocol definition
`llm/backend/factory.py:7`	`BACKEND_FACTORY`	Backend factory mapping
`llm/backend/mistral.py:30-111`	`MistralMapper`	Message type conversion
`llm/backend/mistral.py:114-134`	`MistralBackend.__init__()`	Constructor
`llm/backend/mistral.py:136-150`	Context manager	Async enter/exit
`config.py:146-149`	`Backend`	Backend enum
`config.py:151-156`	`ProviderConfig`	Provider configuration
`config.py:241-247`	`ModelConfig`	Model configuration
`agent_loop.py:141-145`	`_select_backend()`	Backend selection
`agent_loop.py:608-662`	`_chat()`	Non-streaming API call
`agent_loop.py:664-720`	`_chat_streaming()`	Streaming API call