← All concepts

Provider abstraction

Decouple agent code from a single LLM vendor. Three patterns observed; each pays a different tax.

8 projects 0 insights 3 variants
TL;DR 6 min read

If you might support multiple LLM vendors — or even multiple models from one vendor — provider abstraction earns its keep on the second model. Three live patterns: a unifying SDK like LiteLLM (cheapest to onboard, lags new features), per-provider adapters (most code, fastest to opt into new features), and zero abstraction (no tax, hard ceiling at one provider). Pick by where you are and where you’ll be.

Provider abstraction

Day one, you’re using one model from one provider. Day ninety, your customer asks if you support Bedrock. Now what?

Pattern 1 — LiteLLM (the unifying SDK)

A library presents one API surface and translates per-provider underneath.

from litellm import completion

response = completion(
    model="anthropic/claude-sonnet-4-6",  # or "openai/gpt-4o", etc.
    messages=messages,
    tools=tools,
)

Wins. ~3 lines to add a new provider. Consistent feature flags. Prompt-caching auto-enabled where supported. Used by OpenHands and Strix.

Loses. The library maintainer’s feature detection lags real provider features by days or weeks. New thinking modes, new caching variants, new structured-output formats — you wait for the wrapper to catch up.

Pattern 2 — per-provider adapters (the explicit one)

Hand-rolled classes, one per vendor. Explicit feature detection per model family.

class AnthropicAdapter(BaseAdapter):
    def supports_thinking(self, model: str) -> bool:
        return 'opus' in model or 'sonnet' in model

class BedrockAdapter(BaseAdapter):
    ...

A model_features registry maps model name → capabilities.

Wins. You control the feature matrix. You can opt into a new feature the day it ships. You can split traffic across adapters per task type.

Loses. More code, more tests, more discipline. Onboarding a new provider is a sprint, not a PR. Used by Hermes — the most mature example in the corpus.

Pattern 3 — provider-native, no abstraction

Just use Anthropic’s SDK. Or just OpenAI’s. Full feature surface, no compromises.

Wins. Zero porting tax. Every Anthropic feature works the day it ships. Easy to debug; the stack trace lines up with the SDK’s own docs.

Loses. The day someone says “let’s also support Bedrock,” you’re rewriting non-trivial code paths. Used by Claude Code (Anthropic-only by design), NanoClaw, claude-financial-services.

What features actually differ across providers

FeatureAnthropicOpenAIMistralBedrock-Anthropic
Tool callingtool_use blockstool_calls[]tool_calls[]tool_use blocks
StreamingSSE, content blocksSSE, deltasSSE, deltasSSE
Cachingexplicit cache_controlauto on prefix ≥ 1024 tokensnonecache_control
Thinkingstructured thinking blockshidden reasoning_tokensn/astructured
Visionper-message imagesper-message imagespartialper-message
JSON modetool_use is the JSON pathresponse_formatpartialtool_use

This matrix is the underlying complexity. Every abstraction is a way of papering over it.

The auxiliary client — split cost from capability

Hermes (and others) keep a separate LLM client for side-channel tasks — summarization, deduplication, tool-arg validation — using a cheaper/faster model than the main loop. The auxiliary client can use a different provider entirely.

class Agent:
    main = AnthropicClient(model="claude-sonnet-4-6")
    aux = OpenAIClient(model="gpt-4o-mini")  # cheap, fast

    def compress(self, msgs):
        return self.aux.complete(SUMMARIZE_PROMPT, msgs)

Pick an abstraction

? How many providers will you support, and how fast must you opt into new features?
  • One provider, ever Zero abstraction. Save the lines. Claude Code
  • Multi-provider, ship-it speed > feature ceiling LiteLLM. OpenHands, Strix
  • Multi-provider with enterprise feature parity Per-provider adapters. Hermes
  • Research / monthly model swaps LiteLLM, with native escape hatches when needed.

Recommended default: Start native. Migrate to LiteLLM when you onboard a second provider. Migrate to per-adapter only when LiteLLM's feature lag becomes a real cost.

Migration path

A pragmatic sequence:

  1. Start native. One provider, full features, no tax.
  2. Switch to LiteLLM when you onboard a second provider.
  3. Switch to per-adapter when LiteLLM’s feature lag costs more than the lines-of-code do.

Three architectures, paid for in sequence as needs grow. Not all at once on day one.

Projects that implement this

  • OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
  • OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
  • Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
  • Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
  • Comp AI (v2) — Comp AI re-architected. Cleaner data model, refined RBAC, structured AI integrations. Useful diff target vs v1.
  • Claude Financial Services — Reference architecture for finance-vertical Claude integrations. Patterns for compliant LLM use in regulated domains.
  • Comp AI (v1) — Compliance-as-a-service vertical SaaS. RBAC, tenant isolation, AI policy generation. v1 architecture.
  • Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.