Skip to content

ADR-004: Layer 3 Agent Gateway — Build Custom in Rust

Status

Superseded — 2026-03-21 Original decision (agentgateway.dev integration) replaced with custom Rust implementation.

Context

Layer 3 needs to manage AI agent sessions with MCP server registry, A2A protocol support, session state management, and blast radius controls.

Options evaluated

  1. agentgateway.dev — Linux Foundation project, Rust-based, MCP + A2A native
  2. Custom Rust service — Build from scratch with full control (axum + tokio, same stack as L2)
  3. Arcade.dev — Hosted agent tool platform
  4. Toolhouse — Agent tool management

Decision

Use agentgateway.dev as the Layer 3 data plane.

Revised decision: Build L3 as a custom Rust agent gateway (axum + tokio), implementing MCP and A2A protocols directly from the public specs. Same architecture pattern as the L2 rewrite.

Rationale for change

Why not agentgateway.dev

Competitive research (2026-03-21) revealed:

  • Essentially a Solo.io project with LF branding — ~11-12 Solo.io PRs to every 4 external. No AWS, Microsoft, Red Hat, or IBM engineers authoring core PRs despite press release claims.
  • Zero verified enterprise production deployments — no named customers, no public case studies with metrics.
  • v1.0.0 released 5 days ago (Mar 16, 2026) with breaking changes (CEL refactor, Helm paths, CRD renames). API not yet battle-tested.
  • Solo.io is Silver tier in AAIF — implements protocols it doesn't control. Platinum members (AWS, Google, Microsoft) could steer specs to favor their own products.
  • Commercial version diverges — Solo Enterprise at v2.2.x vs OSS at v1.0.1. Meaningful feature gap (tool poisoning protection, cryptographic audit trails, sandboxing).
  • AWS is a competitor, not a partner — Bedrock AgentCore Gateway is a direct substitute. "Contributing org" claim is misleading.

Why custom Rust

  • Proven pattern: L2 rewrite from LiteLLM to custom Rust (axum + tokio) already shipped successfully.
  • Full control: No upstream dependency risk, no version tracking, no fork management.
  • Protocol specs are public: MCP and A2A are open specifications — implementation doesn't require agentgateway.dev.
  • Unified stack: L2 and L3 share the same Rust runtime (axum + tokio + redis + clickhouse), simplifying deployment, debugging, and cross-layer trace propagation.
  • Feature ownership: Tool poisoning protection, HITL gates, per-session budgets — we build exactly what we need without waiting for upstream.
  • On-prem differentiator intact: Custom build is fully air-gappable by definition.

Blast radius controls (unchanged)

  • Every agent session gets an explicit tool allowlist (deny by default)
  • Network policy: agents can only reach approved endpoints
  • Resource limits: concurrent operations, memory, execution time
  • All actions logged to ClickHouse audit trail

What we take from agentgateway.dev (inspiration, not code)

  • CEL-based RBAC pattern for tool access control
  • SessionManager concept for per-client tool federation
  • OTel GenAI semantic conventions for trace emission
  • OpenAPI-to-MCP translation approach

Consequences

  • More upfront implementation work (~2 weeks vs ~1 week for integration)
  • No upstream community contributions or bug fixes — we own everything
  • MCP and A2A protocol spec changes must be tracked independently
  • Simpler operational model — one Rust binary for L3, no external project dependency

Enterprise API + AI + Agent Gateway