CodeDocs Vault

LLM Backend Abstraction

The LLM backend system provides a pluggable abstraction for connecting to different LLM providers while maintaining a consistent interface for the Agent.

Overview

Directory: vibe/core/llm/

vibe/core/llm/
├── backend/
│   ├── factory.py      # Backend factory mapping
│   ├── mistral.py      # Native Mistral SDK backend
│   └── generic.py      # OpenAI-compatible backend
├── types.py            # BackendLike protocol
├── format.py           # Message/tool formatting
└── exceptions.py       # Backend exceptions

BackendLike Protocol

File: vibe/core/llm/types.py Protocol: at line 13

# types.py:13-120
class BackendLike(Protocol):
    """Port protocol for dependency-injectable LLM backends.
 
    Any backend used by Agent should implement this async context manager
    interface with `complete`, `complete_streaming` and `count_tokens` methods.
    """
 
    async def __aenter__(self) -> BackendLike: ...
    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: ...
 
    async def complete(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float,
        tools: list[AvailableTool] | None,
        max_tokens: int | None,
        tool_choice: StrToolChoice | AvailableTool | None,
        extra_headers: dict[str, str] | None,
    ) -> LLMChunk: ...
 
    def complete_streaming(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float,
        tools: list[AvailableTool] | None,
        max_tokens: int | None,
        tool_choice: StrToolChoice | AvailableTool | None,
        extra_headers: dict[str, str] | None,
    ) -> AsyncGenerator[LLMChunk, None]: ...
 
    async def count_tokens(
        self,
        *,
        model: ModelConfig,
        messages: list[LLMMessage],
        temperature: float = 0.0,
        tools: list[AvailableTool] | None,
        tool_choice: StrToolChoice | AvailableTool | None = None,
        extra_headers: dict[str, str] | None,
    ) -> int: ...

Method Purposes

Method Purpose
complete() Non-streaming completion, returns single LLMChunk
complete_streaming() Streaming completion, yields LLMChunk objects
count_tokens() Count prompt tokens without generating response

Backend Factory

File: vibe/core/llm/backend/factory.py

# factory.py:7
BACKEND_FACTORY = {
    Backend.MISTRAL: MistralBackend,
    Backend.GENERIC: GenericBackend
}

Backend Selection

Method: _select_backend() at agent_loop.py:141

# agent_loop.py:141-145
def _select_backend(self) -> BackendLike:
    active_model = self.config.get_active_model()
    provider = self.config.get_provider_for_model(active_model)
    timeout = self.config.api_timeout
    return BACKEND_FACTORY[provider.backend](provider=provider, timeout=timeout)

Mistral Backend

File: vibe/core/llm/backend/mistral.py Class: MistralBackend at line 114

Uses the native Mistral AI SDK (mistralai package).

Constructor

# mistral.py:115-134
def __init__(self, provider: ProviderConfig, timeout: float = 720.0) -> None:
    self._client: mistralai.Mistral | None = None
    self._provider = provider
    self._mapper = MistralMapper()
    self._api_key = os.getenv(provider.api_key_env_var) if provider.api_key_env_var else None
    # Parse API base URL
    self._server_url = extract_server_url(provider.api_base)
    self._timeout = timeout

Async Context Manager

# mistral.py:136-150
async def __aenter__(self) -> MistralBackend:
    self._client = mistralai.Mistral(
        api_key=self._api_key,
        server_url=self._server_url,
        timeout_ms=int(self._timeout * 1000),
    )
    await self._client.__aenter__()
    return self
 
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
    if self._client:
        await self._client.__aexit__(exc_type, exc_val, exc_tb)

Message Mapping

Class: MistralMapper at mistral.py:30

Converts between internal LLMMessage and Mistral SDK types:

# mistral.py:31-60
def prepare_message(self, msg: LLMMessage) -> mistralai.Messages:
    match msg.role:
        case Role.system:
            return mistralai.SystemMessage(role="system", content=msg.content or "")
        case Role.user:
            return mistralai.UserMessage(role="user", content=msg.content)
        case Role.assistant:
            return mistralai.AssistantMessage(
                role="assistant",
                content=msg.content,
                tool_calls=[...],  # Convert tool calls
            )
        case Role.tool:
            return mistralai.ToolMessage(
                role="tool",
                content=msg.content,
                tool_call_id=msg.tool_call_id,
                name=msg.name,
            )

Generic Backend

File: vibe/core/llm/backend/generic.py Class: GenericBackend

OpenAI-compatible backend using httpx for HTTP requests.

Works with:

Key Differences from Mistral

Aspect Mistral Backend Generic Backend
SDK Native mistralai SDK Raw httpx HTTP client
Authentication SDK handles auth Manual header injection
Message format SDK types JSON dictionaries
Error handling SDK exceptions HTTP status codes

Message Types

File: vibe/core/types.py

LLMMessage

# types.py:176-210
class LLMMessage(BaseModel):
    role: Role                              # system, user, assistant, tool
    content: Content | None = None          # Message text
    tool_calls: list[ToolCall] | None = None  # For assistant messages
    name: str | None = None                 # Tool name (tool messages)
    tool_call_id: str | None = None         # Tool call reference
    reasoning_content: str | None = None    # Reasoning/thinking tokens (v2.0+)
    message_id: str | None = None           # Auto-generated UUID for non-tool messages (v2.0+)

The reasoning_content field stores reasoning/thinking tokens from models that support them. During streaming, reasoning content is accumulated via LLMMessage.__add__() and yields ReasoningEvent objects.

The message_id field is auto-assigned (UUID) for non-tool messages via the _from_any() validator.

LLMChunk

# types.py:207-211
class LLMChunk(BaseModel):
    message: LLMMessage
    finish_reason: str | None = None
    usage: LLMUsage | None = None

LLMUsage

# types.py:201-204
class LLMUsage(BaseModel):
    prompt_tokens: int = 0
    completion_tokens: int = 0

Tool Format Handling

File: vibe/core/llm/format.py Class: APIToolFormatHandler

Available Tools Formatting

get_available_tools(tool_manager, config)
    │
    ├─► Get all available tool classes
    │
    ├─► Filter by enabled_tools/disabled_tools patterns
    │
    └─► Convert to AvailableTool format:
        AvailableTool(
            type="function",
            function=AvailableFunction(
                name=tool.get_name(),
                description=tool.description,
                parameters=tool.get_parameters()
            )
        )

Tool Call Resolution

resolve_tool_calls(parsed_message, tool_manager, config)
    │
    ├─► For each tool call in message:
    │   │
    │   ├── Parse JSON arguments
    │   │
    │   ├── Look up tool class
    │   │
    │   ├── Validate arguments against schema
    │   │
    │   └── Create ResolvedToolCall or FailedToolCall
    │
    └─► Return ResolvedMessage with lists of resolved and failed calls

Provider Configuration

File: vibe/core/config.py

ProviderConfig

# config.py:151-156
class ProviderConfig(BaseModel):
    name: str                               # Provider identifier
    api_base: str                           # API base URL
    api_key_env_var: str = ""              # Environment variable for API key
    api_style: str = "openai"              # API style
    backend: Backend = Backend.GENERIC      # Backend type

ModelConfig

# config.py:241-247
class ModelConfig(BaseModel):
    name: str                    # Model name sent to API
    provider: str                # Provider name reference
    alias: str                   # User-friendly alias
    temperature: float = 0.2    # Default temperature
    input_price: float = 0.0    # Price per million input tokens
    output_price: float = 0.0   # Price per million output tokens

Default Providers

# config.py:258-270
DEFAULT_PROVIDERS = [
    ProviderConfig(
        name="mistral",
        api_base="https://api.mistral.ai/v1",
        api_key_env_var="MISTRAL_API_KEY",
        backend=Backend.MISTRAL,
    ),
    ProviderConfig(
        name="llamacpp",
        api_base="http://127.0.0.1:8080/v1",
        api_key_env_var="",
    ),
]

Backend Usage in Agent

Non-Streaming Call

# agent_loop.py:608-662
async def _chat(self, max_tokens: int | None = None) -> LLMChunk:
    active_model = self.config.get_active_model()
    provider = self.config.get_provider_for_model(active_model)
    available_tools = self.format_handler.get_available_tools(...)
    tool_choice = self.format_handler.get_tool_choice()
 
    async with self.backend as backend:
        result = await backend.complete(
            model=active_model,
            messages=self.messages,
            temperature=active_model.temperature,
            tools=available_tools,
            tool_choice=tool_choice,
            extra_headers={
                "user-agent": get_user_agent(provider.backend),
                "x-affinity": self.session_id,
            },
            max_tokens=max_tokens,
        )
 
    # Update stats
    self.stats.last_turn_duration = ...
    self.stats.session_prompt_tokens += result.usage.prompt_tokens
    ...
 
    return processed_chunk

Streaming Call

# agent_loop.py:664-720
async def _chat_streaming(self, max_tokens: int | None = None) -> AsyncGenerator[LLMChunk]:
    ...
    async with self.backend as backend:
        async for chunk in backend.complete_streaming(
            model=active_model,
            messages=self.messages,
            temperature=active_model.temperature,
            tools=available_tools,
            tool_choice=tool_choice,
            extra_headers={...},
            max_tokens=max_tokens,
        ):
            yield processed_chunk
    ...

Error Handling

File: vibe/core/llm/exceptions.py

Backend errors are wrapped with context about the provider and model:

# agent_loop.py:659-662
except Exception as e:
    raise RuntimeError(
        f"API error from {provider.name} (model: {active_model.name}): {e}"
    ) from e

Adding a New Backend

  1. Create backend class implementing BackendLike protocol
  2. Implement async context manager (__aenter__, __aexit__)
  3. Implement complete(), complete_streaming(), count_tokens()
  4. Add to BACKEND_FACTORY in factory.py
  5. Add corresponding Backend enum value in config.py

Source File References

File Key Lines Description
llm/types.py:13-120 BackendLike Backend protocol definition
llm/backend/factory.py:7 BACKEND_FACTORY Backend factory mapping
llm/backend/mistral.py:30-111 MistralMapper Message type conversion
llm/backend/mistral.py:114-134 MistralBackend.__init__() Constructor
llm/backend/mistral.py:136-150 Context manager Async enter/exit
config.py:146-149 Backend Backend enum
config.py:151-156 ProviderConfig Provider configuration
config.py:241-247 ModelConfig Model configuration
agent_loop.py:141-145 _select_backend() Backend selection
agent_loop.py:608-662 _chat() Non-streaming API call
agent_loop.py:664-720 _chat_streaming() Streaming API call