ADR-004: Layer 3 Agent Gateway — Build Custom in Rust

Status

Superseded — 2026-03-21 Original decision (agentgateway.dev integration) replaced with custom Rust implementation.

Context

Layer 3 needs to manage AI agent sessions with MCP server registry, A2A protocol support, session state management, and blast radius controls.

Options evaluated

agentgateway.dev — Linux Foundation project, Rust-based, MCP + A2A native
Custom Rust service — Build from scratch with full control (axum + tokio, same stack as L2)
Arcade.dev — Hosted agent tool platform
Toolhouse — Agent tool management

Decision

~~Use agentgateway.dev as the Layer 3 data plane.~~

Revised decision: Build L3 as a custom Rust agent gateway (axum + tokio), implementing MCP and A2A protocols directly from the public specs. Same architecture pattern as the L2 rewrite.

Rationale for change

Why not agentgateway.dev

Competitive research (2026-03-21) revealed:

Essentially a Solo.io project with LF branding — ~11-12 Solo.io PRs to every 4 external. No AWS, Microsoft, Red Hat, or IBM engineers authoring core PRs despite press release claims.
Zero verified enterprise production deployments — no named customers, no public case studies with metrics.
v1.0.0 released 5 days ago (Mar 16, 2026) with breaking changes (CEL refactor, Helm paths, CRD renames). API not yet battle-tested.
Solo.io is Silver tier in AAIF — implements protocols it doesn't control. Platinum members (AWS, Google, Microsoft) could steer specs to favor their own products.
Commercial version diverges — Solo Enterprise at v2.2.x vs OSS at v1.0.1. Meaningful feature gap (tool poisoning protection, cryptographic audit trails, sandboxing).
AWS is a competitor, not a partner — Bedrock AgentCore Gateway is a direct substitute. "Contributing org" claim is misleading.

Why custom Rust

Proven pattern: L2 rewrite from LiteLLM to custom Rust (axum + tokio) already shipped successfully.
Full control: No upstream dependency risk, no version tracking, no fork management.
Protocol specs are public: MCP and A2A are open specifications — implementation doesn't require agentgateway.dev.
Unified stack: L2 and L3 share the same Rust runtime (axum + tokio + redis + clickhouse), simplifying deployment, debugging, and cross-layer trace propagation.
Feature ownership: Tool poisoning protection, HITL gates, per-session budgets — we build exactly what we need without waiting for upstream.
On-prem differentiator intact: Custom build is fully air-gappable by definition.

Blast radius controls (unchanged)

Every agent session gets an explicit tool allowlist (deny by default)
Network policy: agents can only reach approved endpoints
Resource limits: concurrent operations, memory, execution time
All actions logged to ClickHouse audit trail

What we take from agentgateway.dev (inspiration, not code)

CEL-based RBAC pattern for tool access control
SessionManager concept for per-client tool federation
OTel GenAI semantic conventions for trace emission
OpenAPI-to-MCP translation approach

Consequences

More upfront implementation work (~2 weeks vs ~1 week for integration)
No upstream community contributions or bug fixes — we own everything
MCP and A2A protocol spec changes must be tracked independently
Simpler operational model — one Rust binary for L3, no external project dependency

ADR-004: Layer 3 Agent Gateway — Build Custom in Rust ​

Status ​

Context ​

Options evaluated ​

Decision ​

Rationale for change ​

Why not agentgateway.dev ​

Why custom Rust ​

Blast radius controls (unchanged) ​

What we take from agentgateway.dev (inspiration, not code) ​

Consequences ​