Introduction

What is Gatez?

Gatez is a self-hosted gateway platform that unifies API management, AI model routing, and agent orchestration into a single, observable stack. It is built for platform engineering teams who need to ship AI-powered products without stitching together a dozen fragmented tools.

The platform is organized into three layers, each handling a distinct class of traffic:

Layer	Name	Technology	Responsibility
L1	API Gateway	Apache APISIX (Lua)	Authentication, rate limiting, request logging, traffic shaping
L2	AI Gateway	Custom Rust (axum)	Multi-model routing, PII redaction, token budgets, semantic caching
L3	Agent Gateway	Custom Rust (axum)	MCP/A2A protocols, session management, tool allowlists, HITL gates

A shared Control Plane (React + Rust API) ties the three layers together with dashboards for operators and self-service portals for tenants.

The Problem

Most organizations running AI workloads end up with:

A general-purpose API gateway that knows nothing about LLMs
A separate AI proxy with no rate limiting or auth integration
No agent governance at all -- tools run with no allowlists, no audit trail, no human-in-the-loop
Separate observability stacks for each layer, with no cross-layer traces
Multi-tenancy bolted on as an afterthought, with shared rate limit buckets and no data isolation

Gatez solves this by providing a single platform where every request -- whether it is a REST call, an LLM completion, or an agent tool invocation -- flows through consistent authentication, rate limiting, PII protection, and audit logging, all isolated per tenant.

Architecture Overview

  Client Request
       |
       v
  ┌──────────────────────────┐
  │  L1 - API Gateway        │   Apache APISIX (:9080)
  │  JWT auth (Zitadel)      │   Lua plugins
  │  Per-tenant rate limit   │   50k TPS target, P99 < 50ms
  │  Request logging (CH)    │
  └───────────┬──────────────┘
              |
              v
  ┌──────────────────────────┐
  │  L2 - AI Gateway         │   Custom Rust (:4000)
  │  Multi-model routing     │   OpenAI, Anthropic, Gemini, Ollama
  │  PII redaction           │   Presidio-style scan before LLM call
  │  Token budget enforce    │   Per-tenant budget in Redis
  │  Semantic cache          │   Redis (exact) + Qdrant (similarity)
  └───────────┬──────────────┘
              |
              v
  ┌──────────────────────────┐
  │  L3 - Agent Gateway      │   Custom Rust (:5001)
  │  MCP + A2A protocols     │   Session state in Redis
  │  Tool allowlists         │   Deny-by-default
  │  HITL approval gates     │   Operator approve/deny
  │  Blast radius controls   │   Limit concurrency, network, files
  └──────────────────────────┘

  Infrastructure:
  ┌────────┐ ┌───────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
  │ Redis  │ │ ClickHouse│ │ Qdrant │ │ Zitadel  │ │  etcd  │
  │ :6380  │ │   :8123   │ │ :6333  │ │  :8085   │ │ :2379  │
  └────────┘ └───────────┘ └────────┘ └──────────┘ └────────┘

  Observability:
  ┌────────────┐ ┌────────┐ ┌─────────┐
  │ Prometheus │ │Grafana │ │ Jaeger  │
  │   :9090    │ │ :3002  │ │ :16686  │
  └────────────┘ └────────┘ └─────────┘

Key Differentiators

Unified platform. One gateway handles REST APIs, LLM completions, and agent tool calls. No need to operate three separate products.

Self-hosted / on-premises. Every component runs from container images. No SaaS dependencies. Your data never leaves your infrastructure.

Multi-tenant from day one. tenant_id is mandatory on every API call, every log entry, every cache key, every rate limit bucket. Tenants never see each other's data.

Rust performance. L2 and L3 are built with axum and tokio. Sub-millisecond overhead, zero-copy SSE streaming, async all the way down. L1 targets 50k TPS at P99 < 50ms.

Cross-layer tracing. OpenTelemetry spans flow from L1 through L2 into L3. A single Jaeger trace shows the full journey of a request from API gateway to LLM provider to agent tool execution.

Agent security built in. Tool allowlists (deny-by-default), human-in-the-loop approval gates, blast radius controls, tool poisoning protection, and full audit logging to ClickHouse.

Who Is It For?

Platform engineers who need to expose AI capabilities to internal teams with proper governance
DevOps / SRE teams who want a single observability stack across API, AI, and agent traffic
AI/ML teams who need multi-model routing with fallback chains, token budgets, and semantic caching
Security teams who require PII redaction before LLM calls, full audit trails, and tenant isolation

Next Steps

Quickstart -- get the full stack running locally in 5 minutes
Architecture Overview -- detailed diagrams and data flow
Environment Variables -- complete configuration reference

Introduction ​

What is Gatez? ​

The Problem ​

Architecture Overview ​

Key Differentiators ​

Who Is It For? ​