ADR-006: Proprietary Rust AI Gateway (replacing LiteLLM)

Status

Accepted — 2026-03-21

Context

Milestone 2 requires a Layer 2 AI Gateway with multi-model routing, semantic caching, PII redaction, token budget enforcement, and SSE streaming. Original plan used Python/LiteLLM.

Options evaluated

Python + LiteLLM — Mature library, fast to build, but GIL-limited, high release churn, 2-5k req/s ceiling
Go — Fast, good concurrency, but no mature LLM routing library
Rust — Fastest, zero-cost abstractions, async-native, aligns with L3 stack
Rust hot path + Python sidecar — Hybrid, but adds operational complexity

Decision

Build a proprietary AI Gateway entirely in Rust. No Python. No LiteLLM dependency.

Rationale

Performance: Sub-millisecond cache hits (Redis + Qdrant). 20k+ req/s for cache-hit path. SSE streaming with zero-copy forwarding.
No dependency risk: LiteLLM has high release churn and breaking changes. Owning the routing logic gives full control.
Stack alignment: L3 (agentgateway.dev) is also Rust. Single language for L2+L3 reduces cognitive overhead and enables shared libraries.
Proprietary moat: Custom PII redaction, semantic cache, and budget enforcement tightly integrated — not a wrapper around someone else's library.
Cache-hit ratio: At 40%+ cache hit rate, 40% of traffic never touches an LLM provider. The gateway itself is the bottleneck for those requests — Rust makes that path sub-millisecond.

Architecture

Request → Rust Gateway
  ├─ Extract tenant_id (from header or JWT)
  ├─ Token budget check (Redis)
  ├─ PII scan (regex + entity detection)
  ├─ Semantic cache lookup (Redis exact → Qdrant similarity)
  ├─ [CACHE HIT] → Return cached response (<1ms)
  ├─ [CACHE MISS] → Forward to LLM provider
  │   ├─ OpenAI (reqwest + SSE)
  │   ├─ Anthropic (reqwest + SSE)
  │   ├─ Gemini (reqwest + SSE)
  │   └─ Ollama (reqwest + SSE)
  ├─ Cache response (Redis + Qdrant)
  ├─ Deduct token budget (Redis)
  └─ Log to ClickHouse (async batch)

Consequences

More upfront development time vs LiteLLM wrapper
Must implement provider-specific API formats (OpenAI, Anthropic, Gemini, Ollama)
Must handle provider quirks (token counting, error mapping, streaming formats)
Team needs Rust proficiency
Significantly faster and more reliable in production

ADR-006: Proprietary Rust AI Gateway (replacing LiteLLM) ​

Status ​

Context ​

Options evaluated ​

Decision ​

Rationale ​

Architecture ​

Consequences ​