Appearance
ADR-004: Layer 3 Agent Gateway — Build Custom in Rust
Status
Superseded — 2026-03-21 Original decision (agentgateway.dev integration) replaced with custom Rust implementation.
Context
Layer 3 needs to manage AI agent sessions with MCP server registry, A2A protocol support, session state management, and blast radius controls.
Options evaluated
- agentgateway.dev — Linux Foundation project, Rust-based, MCP + A2A native
- Custom Rust service — Build from scratch with full control (axum + tokio, same stack as L2)
- Arcade.dev — Hosted agent tool platform
- Toolhouse — Agent tool management
Decision
Use agentgateway.dev as the Layer 3 data plane.
Revised decision: Build L3 as a custom Rust agent gateway (axum + tokio), implementing MCP and A2A protocols directly from the public specs. Same architecture pattern as the L2 rewrite.
Rationale for change
Why not agentgateway.dev
Competitive research (2026-03-21) revealed:
- Essentially a Solo.io project with LF branding — ~11-12 Solo.io PRs to every 4 external. No AWS, Microsoft, Red Hat, or IBM engineers authoring core PRs despite press release claims.
- Zero verified enterprise production deployments — no named customers, no public case studies with metrics.
- v1.0.0 released 5 days ago (Mar 16, 2026) with breaking changes (CEL refactor, Helm paths, CRD renames). API not yet battle-tested.
- Solo.io is Silver tier in AAIF — implements protocols it doesn't control. Platinum members (AWS, Google, Microsoft) could steer specs to favor their own products.
- Commercial version diverges — Solo Enterprise at v2.2.x vs OSS at v1.0.1. Meaningful feature gap (tool poisoning protection, cryptographic audit trails, sandboxing).
- AWS is a competitor, not a partner — Bedrock AgentCore Gateway is a direct substitute. "Contributing org" claim is misleading.
Why custom Rust
- Proven pattern: L2 rewrite from LiteLLM to custom Rust (axum + tokio) already shipped successfully.
- Full control: No upstream dependency risk, no version tracking, no fork management.
- Protocol specs are public: MCP and A2A are open specifications — implementation doesn't require agentgateway.dev.
- Unified stack: L2 and L3 share the same Rust runtime (axum + tokio + redis + clickhouse), simplifying deployment, debugging, and cross-layer trace propagation.
- Feature ownership: Tool poisoning protection, HITL gates, per-session budgets — we build exactly what we need without waiting for upstream.
- On-prem differentiator intact: Custom build is fully air-gappable by definition.
Blast radius controls (unchanged)
- Every agent session gets an explicit tool allowlist (deny by default)
- Network policy: agents can only reach approved endpoints
- Resource limits: concurrent operations, memory, execution time
- All actions logged to ClickHouse audit trail
What we take from agentgateway.dev (inspiration, not code)
- CEL-based RBAC pattern for tool access control
- SessionManager concept for per-client tool federation
- OTel GenAI semantic conventions for trace emission
- OpenAPI-to-MCP translation approach
Consequences
- More upfront implementation work (~2 weeks vs ~1 week for integration)
- No upstream community contributions or bug fixes — we own everything
- MCP and A2A protocol spec changes must be tracked independently
- Simpler operational model — one Rust binary for L3, no external project dependency