LLM Backend Abstraction
The LLM backend system provides a pluggable abstraction for connecting to different LLM providers while maintaining a consistent interface for the Agent.
Overview
Directory: vibe/core/llm/
vibe/core/llm/
├── backend/
│ ├── factory.py # Backend factory mapping
│ ├── mistral.py # Native Mistral SDK backend
│ └── generic.py # OpenAI-compatible backend
├── types.py # BackendLike protocol
├── format.py # Message/tool formatting
└── exceptions.py # Backend exceptions
BackendLike Protocol
File: vibe/core/llm/types.py
Protocol: at line 13
# types.py:13-120
class BackendLike(Protocol):
"""Port protocol for dependency-injectable LLM backends.
Any backend used by Agent should implement this async context manager
interface with `complete`, `complete_streaming` and `count_tokens` methods.
"""
async def __aenter__(self) -> BackendLike: ...
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: ...
async def complete(
self,
*,
model: ModelConfig,
messages: list[LLMMessage],
temperature: float,
tools: list[AvailableTool] | None,
max_tokens: int | None,
tool_choice: StrToolChoice | AvailableTool | None,
extra_headers: dict[str, str] | None,
) -> LLMChunk: ...
def complete_streaming(
self,
*,
model: ModelConfig,
messages: list[LLMMessage],
temperature: float,
tools: list[AvailableTool] | None,
max_tokens: int | None,
tool_choice: StrToolChoice | AvailableTool | None,
extra_headers: dict[str, str] | None,
) -> AsyncGenerator[LLMChunk, None]: ...
async def count_tokens(
self,
*,
model: ModelConfig,
messages: list[LLMMessage],
temperature: float = 0.0,
tools: list[AvailableTool] | None,
tool_choice: StrToolChoice | AvailableTool | None = None,
extra_headers: dict[str, str] | None,
) -> int: ...Method Purposes
| Method | Purpose |
|---|---|
complete() |
Non-streaming completion, returns single LLMChunk |
complete_streaming() |
Streaming completion, yields LLMChunk objects |
count_tokens() |
Count prompt tokens without generating response |
Backend Factory
File: vibe/core/llm/backend/factory.py
# factory.py:7
BACKEND_FACTORY = {
Backend.MISTRAL: MistralBackend,
Backend.GENERIC: GenericBackend
}Backend Selection
Method: _select_backend() at agent_loop.py:141
# agent_loop.py:141-145
def _select_backend(self) -> BackendLike:
active_model = self.config.get_active_model()
provider = self.config.get_provider_for_model(active_model)
timeout = self.config.api_timeout
return BACKEND_FACTORY[provider.backend](provider=provider, timeout=timeout)Mistral Backend
File: vibe/core/llm/backend/mistral.py
Class: MistralBackend at line 114
Uses the native Mistral AI SDK (mistralai package).
Constructor
# mistral.py:115-134
def __init__(self, provider: ProviderConfig, timeout: float = 720.0) -> None:
self._client: mistralai.Mistral | None = None
self._provider = provider
self._mapper = MistralMapper()
self._api_key = os.getenv(provider.api_key_env_var) if provider.api_key_env_var else None
# Parse API base URL
self._server_url = extract_server_url(provider.api_base)
self._timeout = timeoutAsync Context Manager
# mistral.py:136-150
async def __aenter__(self) -> MistralBackend:
self._client = mistralai.Mistral(
api_key=self._api_key,
server_url=self._server_url,
timeout_ms=int(self._timeout * 1000),
)
await self._client.__aenter__()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
if self._client:
await self._client.__aexit__(exc_type, exc_val, exc_tb)Message Mapping
Class: MistralMapper at mistral.py:30
Converts between internal LLMMessage and Mistral SDK types:
# mistral.py:31-60
def prepare_message(self, msg: LLMMessage) -> mistralai.Messages:
match msg.role:
case Role.system:
return mistralai.SystemMessage(role="system", content=msg.content or "")
case Role.user:
return mistralai.UserMessage(role="user", content=msg.content)
case Role.assistant:
return mistralai.AssistantMessage(
role="assistant",
content=msg.content,
tool_calls=[...], # Convert tool calls
)
case Role.tool:
return mistralai.ToolMessage(
role="tool",
content=msg.content,
tool_call_id=msg.tool_call_id,
name=msg.name,
)Generic Backend
File: vibe/core/llm/backend/generic.py
Class: GenericBackend
OpenAI-compatible backend using httpx for HTTP requests.
Works with:
- Local models (llama.cpp, Ollama)
- OpenAI API
- Other OpenAI-compatible APIs
Key Differences from Mistral
| Aspect | Mistral Backend | Generic Backend |
|---|---|---|
| SDK | Native mistralai SDK | Raw httpx HTTP client |
| Authentication | SDK handles auth | Manual header injection |
| Message format | SDK types | JSON dictionaries |
| Error handling | SDK exceptions | HTTP status codes |
Message Types
File: vibe/core/types.py
LLMMessage
# types.py:176-210
class LLMMessage(BaseModel):
role: Role # system, user, assistant, tool
content: Content | None = None # Message text
tool_calls: list[ToolCall] | None = None # For assistant messages
name: str | None = None # Tool name (tool messages)
tool_call_id: str | None = None # Tool call reference
reasoning_content: str | None = None # Reasoning/thinking tokens (v2.0+)
message_id: str | None = None # Auto-generated UUID for non-tool messages (v2.0+)The reasoning_content field stores reasoning/thinking tokens from models that support them. During streaming, reasoning content is accumulated via LLMMessage.__add__() and yields ReasoningEvent objects.
The message_id field is auto-assigned (UUID) for non-tool messages via the _from_any() validator.
LLMChunk
# types.py:207-211
class LLMChunk(BaseModel):
message: LLMMessage
finish_reason: str | None = None
usage: LLMUsage | None = NoneLLMUsage
# types.py:201-204
class LLMUsage(BaseModel):
prompt_tokens: int = 0
completion_tokens: int = 0Tool Format Handling
File: vibe/core/llm/format.py
Class: APIToolFormatHandler
Available Tools Formatting
get_available_tools(tool_manager, config)
│
├─► Get all available tool classes
│
├─► Filter by enabled_tools/disabled_tools patterns
│
└─► Convert to AvailableTool format:
AvailableTool(
type="function",
function=AvailableFunction(
name=tool.get_name(),
description=tool.description,
parameters=tool.get_parameters()
)
)
Tool Call Resolution
resolve_tool_calls(parsed_message, tool_manager, config)
│
├─► For each tool call in message:
│ │
│ ├── Parse JSON arguments
│ │
│ ├── Look up tool class
│ │
│ ├── Validate arguments against schema
│ │
│ └── Create ResolvedToolCall or FailedToolCall
│
└─► Return ResolvedMessage with lists of resolved and failed calls
Provider Configuration
File: vibe/core/config.py
ProviderConfig
# config.py:151-156
class ProviderConfig(BaseModel):
name: str # Provider identifier
api_base: str # API base URL
api_key_env_var: str = "" # Environment variable for API key
api_style: str = "openai" # API style
backend: Backend = Backend.GENERIC # Backend typeModelConfig
# config.py:241-247
class ModelConfig(BaseModel):
name: str # Model name sent to API
provider: str # Provider name reference
alias: str # User-friendly alias
temperature: float = 0.2 # Default temperature
input_price: float = 0.0 # Price per million input tokens
output_price: float = 0.0 # Price per million output tokensDefault Providers
# config.py:258-270
DEFAULT_PROVIDERS = [
ProviderConfig(
name="mistral",
api_base="https://api.mistral.ai/v1",
api_key_env_var="MISTRAL_API_KEY",
backend=Backend.MISTRAL,
),
ProviderConfig(
name="llamacpp",
api_base="http://127.0.0.1:8080/v1",
api_key_env_var="",
),
]Backend Usage in Agent
Non-Streaming Call
# agent_loop.py:608-662
async def _chat(self, max_tokens: int | None = None) -> LLMChunk:
active_model = self.config.get_active_model()
provider = self.config.get_provider_for_model(active_model)
available_tools = self.format_handler.get_available_tools(...)
tool_choice = self.format_handler.get_tool_choice()
async with self.backend as backend:
result = await backend.complete(
model=active_model,
messages=self.messages,
temperature=active_model.temperature,
tools=available_tools,
tool_choice=tool_choice,
extra_headers={
"user-agent": get_user_agent(provider.backend),
"x-affinity": self.session_id,
},
max_tokens=max_tokens,
)
# Update stats
self.stats.last_turn_duration = ...
self.stats.session_prompt_tokens += result.usage.prompt_tokens
...
return processed_chunkStreaming Call
# agent_loop.py:664-720
async def _chat_streaming(self, max_tokens: int | None = None) -> AsyncGenerator[LLMChunk]:
...
async with self.backend as backend:
async for chunk in backend.complete_streaming(
model=active_model,
messages=self.messages,
temperature=active_model.temperature,
tools=available_tools,
tool_choice=tool_choice,
extra_headers={...},
max_tokens=max_tokens,
):
yield processed_chunk
...Error Handling
File: vibe/core/llm/exceptions.py
Backend errors are wrapped with context about the provider and model:
# agent_loop.py:659-662
except Exception as e:
raise RuntimeError(
f"API error from {provider.name} (model: {active_model.name}): {e}"
) from eAdding a New Backend
- Create backend class implementing
BackendLikeprotocol - Implement async context manager (
__aenter__,__aexit__) - Implement
complete(),complete_streaming(),count_tokens() - Add to
BACKEND_FACTORYinfactory.py - Add corresponding
Backendenum value inconfig.py
Source File References
| File | Key Lines | Description |
|---|---|---|
llm/types.py:13-120 |
BackendLike |
Backend protocol definition |
llm/backend/factory.py:7 |
BACKEND_FACTORY |
Backend factory mapping |
llm/backend/mistral.py:30-111 |
MistralMapper |
Message type conversion |
llm/backend/mistral.py:114-134 |
MistralBackend.__init__() |
Constructor |
llm/backend/mistral.py:136-150 |
Context manager | Async enter/exit |
config.py:146-149 |
Backend |
Backend enum |
config.py:151-156 |
ProviderConfig |
Provider configuration |
config.py:241-247 |
ModelConfig |
Model configuration |
agent_loop.py:141-145 |
_select_backend() |
Backend selection |
agent_loop.py:608-662 |
_chat() |
Non-streaming API call |
agent_loop.py:664-720 |
_chat_streaming() |
Streaming API call |