If you might support multiple LLM vendors — or even multiple models from one vendor — provider abstraction earns its keep on the second model. Three live patterns: a unifying SDK like LiteLLM (cheapest to onboard, lags new features), per-provider adapters (most code, fastest to opt into new features), and zero abstraction (no tax, hard ceiling at one provider). Pick by where you are and where you’ll be.
Provider abstraction
Day one, you’re using one model from one provider. Day ninety, your customer asks if you support Bedrock. Now what?
Pattern 1 — LiteLLM (the unifying SDK)
A library presents one API surface and translates per-provider underneath.
from litellm import completion
response = completion(
model="anthropic/claude-sonnet-4-6", # or "openai/gpt-4o", etc.
messages=messages,
tools=tools,
)
Wins. ~3 lines to add a new provider. Consistent feature flags. Prompt-caching auto-enabled where supported. Used by OpenHands and Strix.
Loses. The library maintainer’s feature detection lags real provider features by days or weeks. New thinking modes, new caching variants, new structured-output formats — you wait for the wrapper to catch up.
Pattern 2 — per-provider adapters (the explicit one)
Hand-rolled classes, one per vendor. Explicit feature detection per model family.
class AnthropicAdapter(BaseAdapter):
def supports_thinking(self, model: str) -> bool:
return 'opus' in model or 'sonnet' in model
class BedrockAdapter(BaseAdapter):
...
A model_features registry maps model name → capabilities.
Wins. You control the feature matrix. You can opt into a new feature the day it ships. You can split traffic across adapters per task type.
Loses. More code, more tests, more discipline. Onboarding a new provider is a sprint, not a PR. Used by Hermes — the most mature example in the corpus.
Pattern 3 — provider-native, no abstraction
Just use Anthropic’s SDK. Or just OpenAI’s. Full feature surface, no compromises.
Wins. Zero porting tax. Every Anthropic feature works the day it ships. Easy to debug; the stack trace lines up with the SDK’s own docs.
Loses. The day someone says “let’s also support Bedrock,” you’re rewriting non-trivial code paths. Used by Claude Code (Anthropic-only by design), NanoClaw, claude-financial-services.
What features actually differ across providers
| Feature | Anthropic | OpenAI | Mistral | Bedrock-Anthropic |
|---|---|---|---|---|
| Tool calling | tool_use blocks | tool_calls[] | tool_calls[] | tool_use blocks |
| Streaming | SSE, content blocks | SSE, deltas | SSE, deltas | SSE |
| Caching | explicit cache_control | auto on prefix ≥ 1024 tokens | none | cache_control |
| Thinking | structured thinking blocks | hidden reasoning_tokens | n/a | structured |
| Vision | per-message images | per-message images | partial | per-message |
| JSON mode | tool_use is the JSON path | response_format | partial | tool_use |
This matrix is the underlying complexity. Every abstraction is a way of papering over it.
The auxiliary client — split cost from capability
Hermes (and others) keep a separate LLM client for side-channel tasks — summarization, deduplication, tool-arg validation — using a cheaper/faster model than the main loop. The auxiliary client can use a different provider entirely.
class Agent:
main = AnthropicClient(model="claude-sonnet-4-6")
aux = OpenAIClient(model="gpt-4o-mini") # cheap, fast
def compress(self, msgs):
return self.aux.complete(SUMMARIZE_PROMPT, msgs)
Pick an abstraction
- One provider, ever Zero abstraction. Save the lines. Claude Code
- Multi-provider, ship-it speed > feature ceiling LiteLLM. OpenHands, Strix
- Multi-provider with enterprise feature parity Per-provider adapters. Hermes
- Research / monthly model swaps LiteLLM, with native escape hatches when needed.
Recommended default: Start native. Migrate to LiteLLM when you onboard a second provider. Migrate to per-adapter only when LiteLLM's feature lag becomes a real cost.
Migration path
A pragmatic sequence:
- Start native. One provider, full features, no tax.
- Switch to LiteLLM when you onboard a second provider.
- Switch to per-adapter when LiteLLM’s feature lag costs more than the lines-of-code do.
Three architectures, paid for in sequence as needs grow. Not all at once on day one.
Projects that implement this
- OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
- OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
- Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
- Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
- Comp AI (v2) — Comp AI re-architected. Cleaner data model, refined RBAC, structured AI integrations. Useful diff target vs v1.
- Claude Financial Services — Reference architecture for finance-vertical Claude integrations. Patterns for compliant LLM use in regulated domains.
- Comp AI (v1) — Compliance-as-a-service vertical SaaS. RBAC, tenant isolation, AI policy generation. v1 architecture.
- Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.