Appearance
Introduction
What is Gatez?
Gatez is a self-hosted gateway platform that unifies API management, AI model routing, and agent orchestration into a single, observable stack. It is built for platform engineering teams who need to ship AI-powered products without stitching together a dozen fragmented tools.
The platform is organized into three layers, each handling a distinct class of traffic:
| Layer | Name | Technology | Responsibility |
|---|---|---|---|
| L1 | API Gateway | Apache APISIX (Lua) | Authentication, rate limiting, request logging, traffic shaping |
| L2 | AI Gateway | Custom Rust (axum) | Multi-model routing, PII redaction, token budgets, semantic caching |
| L3 | Agent Gateway | Custom Rust (axum) | MCP/A2A protocols, session management, tool allowlists, HITL gates |
A shared Control Plane (React + Rust API) ties the three layers together with dashboards for operators and self-service portals for tenants.
The Problem
Most organizations running AI workloads end up with:
- A general-purpose API gateway that knows nothing about LLMs
- A separate AI proxy with no rate limiting or auth integration
- No agent governance at all -- tools run with no allowlists, no audit trail, no human-in-the-loop
- Separate observability stacks for each layer, with no cross-layer traces
- Multi-tenancy bolted on as an afterthought, with shared rate limit buckets and no data isolation
Gatez solves this by providing a single platform where every request -- whether it is a REST call, an LLM completion, or an agent tool invocation -- flows through consistent authentication, rate limiting, PII protection, and audit logging, all isolated per tenant.
Architecture Overview
Client Request
|
v
┌──────────────────────────┐
│ L1 - API Gateway │ Apache APISIX (:9080)
│ JWT auth (Keycloak) │ Lua plugins
│ Per-tenant rate limit │ 50k TPS target, P99 < 50ms
│ Request logging (CH) │
└───────────┬──────────────┘
|
v
┌──────────────────────────┐
│ L2 - AI Gateway │ Custom Rust (:4000)
│ Multi-model routing │ OpenAI, Anthropic, Gemini, Ollama
│ PII redaction │ Presidio-style scan before LLM call
│ Token budget enforce │ Per-tenant budget in Redis
│ Semantic cache │ Redis (exact) + Qdrant (similarity)
└───────────┬──────────────┘
|
v
┌──────────────────────────┐
│ L3 - Agent Gateway │ Custom Rust (:5001)
│ MCP + A2A protocols │ Session state in Redis
│ Tool allowlists │ Deny-by-default
│ HITL approval gates │ Operator approve/deny
│ Blast radius controls │ Limit concurrency, network, files
└──────────────────────────┘
Infrastructure:
┌────────┐ ┌───────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
│ Redis │ │ ClickHouse│ │ Qdrant │ │ Keycloak │ │ etcd │
│ :6380 │ │ :8123 │ │ :6333 │ │ :8081 │ │ :2379 │
└────────┘ └───────────┘ └────────┘ └──────────┘ └────────┘
Observability:
┌────────────┐ ┌────────┐ ┌─────────┐
│ Prometheus │ │Grafana │ │ Jaeger │
│ :9090 │ │ :3002 │ │ :16686 │
└────────────┘ └────────┘ └─────────┘Key Differentiators
Unified platform. One gateway handles REST APIs, LLM completions, and agent tool calls. No need to operate three separate products.
Self-hosted / on-premises. Every component runs from container images. No SaaS dependencies. Your data never leaves your infrastructure.
Multi-tenant from day one. tenant_id is mandatory on every API call, every log entry, every cache key, every rate limit bucket. Tenants never see each other's data.
Rust performance. L2 and L3 are built with axum and tokio. Sub-millisecond overhead, zero-copy SSE streaming, async all the way down. L1 targets 50k TPS at P99 < 50ms.
Cross-layer tracing. OpenTelemetry spans flow from L1 through L2 into L3. A single Jaeger trace shows the full journey of a request from API gateway to LLM provider to agent tool execution.
Agent security built in. Tool allowlists (deny-by-default), human-in-the-loop approval gates, blast radius controls, tool poisoning protection, and full audit logging to ClickHouse.
Who Is It For?
- Platform engineers who need to expose AI capabilities to internal teams with proper governance
- DevOps / SRE teams who want a single observability stack across API, AI, and agent traffic
- AI/ML teams who need multi-model routing with fallback chains, token budgets, and semantic caching
- Security teams who require PII redaction before LLM calls, full audit trails, and tenant isolation
Next Steps
- Quickstart -- get the full stack running locally in 5 minutes
- Architecture Overview -- detailed diagrams and data flow
- Environment Variables -- complete configuration reference