Skip to content

ADR-006: Proprietary Rust AI Gateway (replacing LiteLLM)

Status

Accepted — 2026-03-21

Context

Milestone 2 requires a Layer 2 AI Gateway with multi-model routing, semantic caching, PII redaction, token budget enforcement, and SSE streaming. Original plan used Python/LiteLLM.

Options evaluated

  1. Python + LiteLLM — Mature library, fast to build, but GIL-limited, high release churn, 2-5k req/s ceiling
  2. Go — Fast, good concurrency, but no mature LLM routing library
  3. Rust — Fastest, zero-cost abstractions, async-native, aligns with L3 stack
  4. Rust hot path + Python sidecar — Hybrid, but adds operational complexity

Decision

Build a proprietary AI Gateway entirely in Rust. No Python. No LiteLLM dependency.

Rationale

  • Performance: Sub-millisecond cache hits (Redis + Qdrant). 20k+ req/s for cache-hit path. SSE streaming with zero-copy forwarding.
  • No dependency risk: LiteLLM has high release churn and breaking changes. Owning the routing logic gives full control.
  • Stack alignment: L3 (agentgateway.dev) is also Rust. Single language for L2+L3 reduces cognitive overhead and enables shared libraries.
  • Proprietary moat: Custom PII redaction, semantic cache, and budget enforcement tightly integrated — not a wrapper around someone else's library.
  • Cache-hit ratio: At 40%+ cache hit rate, 40% of traffic never touches an LLM provider. The gateway itself is the bottleneck for those requests — Rust makes that path sub-millisecond.

Architecture

Request → Rust Gateway
  ├─ Extract tenant_id (from header or JWT)
  ├─ Token budget check (Redis)
  ├─ PII scan (regex + entity detection)
  ├─ Semantic cache lookup (Redis exact → Qdrant similarity)
  ├─ [CACHE HIT] → Return cached response (<1ms)
  ├─ [CACHE MISS] → Forward to LLM provider
  │   ├─ OpenAI (reqwest + SSE)
  │   ├─ Anthropic (reqwest + SSE)
  │   ├─ Gemini (reqwest + SSE)
  │   └─ Ollama (reqwest + SSE)
  ├─ Cache response (Redis + Qdrant)
  ├─ Deduct token budget (Redis)
  └─ Log to ClickHouse (async batch)

Consequences

  • More upfront development time vs LiteLLM wrapper
  • Must implement provider-specific API formats (OpenAI, Anthropic, Gemini, Ollama)
  • Must handle provider quirks (token counting, error mapping, streaming formats)
  • Team needs Rust proficiency
  • Significantly faster and more reliable in production

Enterprise API + AI + Agent Gateway