Gatez Feature Matrix

Complete feature inventory across all dimensions. Status: Built = shipped and tested.

API Gateway (Layer 1 — APISIX + Lua)

Feature	Status	Details
HTTP/HTTPS proxying	Built	APISIX core, TLS termination via Caddy/cert-manager
Per-tenant rate limiting	Built	Redis-backed Lua plugin, sliding window, `rl:{tenant_id}:{route}:{window}`
Rate limit hierarchy	Built	Global → tenant → route overrides, visual editor in Operator Portal
JWT authentication	Built	Zitadel OIDC integration, tenant_id extracted from JWT claims
API key authentication	Built	key-auth plugin, request → approve → issue workflow
Request logging	Built	ClickHouse via http-logger plugin, async (never blocks request path)
Route management	Built	APISIX Admin API CRUD, lifecycle states (draft → published → deprecated → retired)
gRPC proxying	Built	grpc-transcode + grpc-web plugins, `/grpc/` and `/grpc-web/` routes
WebSocket proxying	Built	`/ws/*` route, 60s keepalive, `enable_websocket` flag
Circuit breaker	Built	api-breaker plugin, configurable thresholds, failure injection for testing
IP restriction per tenant	Built	APISIX ip-restriction plugin, `PUT /api/tenants/:id/ip-allowlist`
Active + passive health checks	Built	All upstreams, exposed in Operator Portal health page
Canary deployments	Built	Traffic splitting via upstream weighting (%), blue-green pattern documented
Service discovery	Built	DNS SRV + Consul + K8s, documented in `docs/deployment/service-discovery.md`
Cross-layer trace propagation	Built	W3C traceparent/tracestate forwarding L1 → L2 → L3
Grafana dashboard	Built	Request rate, error rate, P99 latency per tenant
Load tested	Built	479 TPS local dev (target 50k TPS on prod hardware), P99 56ms

AI Gateway (Layer 2 — Custom Rust, axum + tokio)

Feature	Status	Details
Multi-model routing	Built	13 providers: OpenAI, Anthropic, Gemini, Ollama, Azure, Mistral, Cohere, DeepSeek, Together, Groq, Fireworks, vLLM, Bedrock (stub)
Model passthrough	Built	Prefix-pattern routing — any model ID routed to correct provider, no whitelist
Model aliasing	Built	`MODEL_ALIASES=fast=gpt-4o-mini,smart=claude-sonnet` env var
P2C load balancing	Built	Power of Two Choices across providers, health scoring (error rate + latency + pending)
Circuit breaker	Built	3 failures → open, 30s recovery, auto half-open
Retry with backoff	Built	2 retries, 100ms initial, max 10s, exponential
Fallback chains	Built	Circuit breaker open → auto-route to next available provider
Redis exact-match cache	Built	Tenant-scoped keys, cache-hit path: 599 req/s, 18ms avg
Semantic cache (Qdrant)	Built	Two-tier: Redis exact → Qdrant similarity, hash-based vectors
PII redaction	Built	Regex: SSN, email, credit card, phone, IP — runs BEFORE LLM call
Multi-layer prompt guards	Built	Pipeline: regex (<1ms) → webhook → action (Pass/Reject/Mask)
Token budget enforcement	Built	Per-tenant, pre-request check in Redis, deduct after response
Streaming SSE	Built	Zero-copy pass-through, no buffering
ClickHouse logging	Built	Async fire-and-forget: model, tokens, latency, cache_hit, pii_detected
Prometheus metrics	Built	Requests, cache, latency, tokens, PII, budget, active requests
Hot config reload	Built	`POST /admin/reload`, RwLock config swap, validates before applying
Provider health API	Built	`GET /v1/providers/health` — all provider stats, error rates, latency
OpenAI-compatible API	Built	Drop-in replacement: `/v1/chat/completions`, `/v1/models`
JWT signature validation	Built	Independent JWKS validation (doesn't trust L1 blindly)
Auth header scrubbing	Built	Strips Authorization/x-api-key before ClickHouse writes
Observability webhooks	Built	Batched LlmEvent export (metadata only, prompts opt-in)
Grafana dashboard	Built	Request rate, cache hit rate, latency P50/P95/P99, tokens, PII, budget
Load tested	Built	599 req/s cache-hit, 18ms avg latency (local Docker)

Agent Gateway (Layer 3 — Custom Rust, axum + tokio)

Feature	Status	Details
MCP protocol	Built	Server registry CRUD, tool discovery, JSON-RPC forwarding
A2A protocol	Built	Agent registry, send message, task tracking, HTTP forwarding
Session lifecycle	Built	Create, list, inspect, terminate — Redis-backed with TTL
Tool allowlists	Built	Deny by default, per-session, tenant-scoped
CEL expression engine	Built	776-line built-in evaluator, 30 tests, jwt/mcp/tenant/session vars
HITL approval gates	Built	Per-tenant configurable, pending queue, approve/deny API
Session token budgets	Built	Per-session limits, budget check before every tool call
Tool poisoning protection	Built	Server fingerprinting, naming collision detection (409 on conflict)
A2A delegation policies	Built	Cross-tenant block, chain depth limit (max 5), loop detection
MCP elicitation	Built	`/v1/elicit` + `/v1/elicit/:id/respond` — structured input via HITL
OpenAPI-to-MCP translation	Built	Auto-convert OpenAPI 3.x specs into MCP tool definitions
Virtual MCP endpoint	Built	`GET /v1/mcp` federates all servers, tool name prefixing
MCP health checks	Built	Background task, 30s interval, Healthy/Degraded/Unhealthy per server
stdio transport	Built	Process lifecycle via StdioManager, JSON-RPC over stdin/stdout
SSE transport	Built	HTTP POST fallback for MCP SSE servers
MCP OAuth	Built	RFC 9728 + RFC 8414, gateway proxy pattern, 3 validation modes
JSON schema validation	Built	Validate tool args against MCP input_schema (types, required fields)
Agent registry persistence	Built	Redis-backed, survives restarts
Cross-layer tracing	Built	OTel + Jaeger, L1 → L2 → L3 span tree
ClickHouse audit trail	Built	Tool calls, A2A hops, session events with tenant_id
Prometheus metrics	Built	Sessions, tool calls, denied, A2A, HITL, latency, poisoning
Hot config reload	Built	`POST /admin/reload`, health check interval + session TTL
Grafana dashboard	Built	7 panels: sessions, tool calls, A2A, HITL, latency, poisoning

Multi-Tenancy

Feature	Status	Details
tenant_id on every call	Built	JWT claim extraction, propagated L1 → L2 → L3
Per-tenant rate limiting	Built	Independent quotas, Redis sliding window, never shared buckets
Per-tenant token budgets	Built	Pre-request check, post-response deduct, alert at 80%
Per-tenant API keys	Built	Namespace-scoped, request → approve → issue workflow
Per-tenant tool allowlists	Built	Deny by default, CEL rules per tenant
Per-tenant HITL policies	Built	Configurable per-tool, per-tenant
Per-tenant branding	Built	Logo (base64, 100KB), portal title, primary color
Tenant-scoped cache	Built	Redis: `{tenant_id}:cache:*`, no cross-tenant sharing
Tenant-scoped analytics	Built	ClickHouse row-level filter, per-tenant dashboards
Tenant-scoped audit trail	Built	Every log entry includes tenant_id
IP restriction per tenant	Built	CIDR allowlist, 403 on violation
Cross-tenant isolation	Built	Session isolation, key isolation, analytics isolation — tested

Control Plane — Operator Portal

Feature	Status	Details
Tenant management	Built	List, create (3-step wizard), edit, suspend, delete
Rate limit editor	Built	Visual hierarchy: global → tenant → route overrides
API catalogue	Built	Route/service browser, search, filter, plugin badges
OAS 3 Swagger UI	Built	Upload spec, inline try-it console, curl generator
API lifecycle	Built	Draft → published → deprecated → retired
API key management	Built	Create, show-once-then-mask, revoke, audit log
Key approval queue	Built	Review tenant requests, approve/deny
Usage analytics	Built	ClickHouse-backed KPI cards, time-series, drill-downs
LLM token analytics	Built	Prompt vs completion, cost per provider, per-model bars
Health monitoring	Built	Upstream status, dependency map, alert config
Session browser	Built	List, filter, terminate from UI
MCP tool registry	Built	Catalog, enable/disable per tenant
MCP tool playground	Built	Auto-generated form, execute, history, curl generator
Trace explorer	Built	Cross-layer L1 → L2 → L3 span tree
A2A topology graph	Built	Agent delegation chains, loop detection
HITL approval queue	Built	Pending tool calls, approve/modify/deny
Policy editor	Built	Visual tool allowlist + RBAC per tenant
CEL playground	Built	Expression editor, context builder, examples, history
Audit log	Built	ClickHouse-backed, filters, CSV export
Settings	Built	Platform config, notification config, data retention
LLM provider management	Built	Add/test/delete providers, secret references, UI tab
User management	Built	Context-aware: SCIM/SSO/Zitadel/Bootstrap adaptive UI
Service accounts	Built	`gtz_sa_` prefixed keys, SHA-256 hashed, show-once modal
Webhook management	Built	Register URL + event types, delivery log
IP allowlist editor	Built	Per-tenant CIDR management
Canary deployment slider	Built	0-100% traffic split per route
Notifications	Built	Bell icon, type-specific icons, polling
Custom branding	Built	Logo upload, portal title, color per tenant

Control Plane — Developer Portal

Feature	Status	Details
API discovery	Built	Browse published APIs, tenant-scoped
Swagger try-it console	Built	Inline method selector, headers, body, live response, curl
Key management	Built	Request → approval → secure issuance (show-once modal)
My keys dashboard	Built	Masked keys, last-used, request count, revoke
Usage dashboard	Built	Request volume, error rate, latency, token consumption
Token budget visibility	Built	Remaining, burn rate, projected exhaustion
Agent session viewer	Built	Sessions, tool call timeline, budget gauges
HITL approval	Built	Approve own sessions, amber banner, countdown timer
Usage drill-down	Built	LLM tokens by model, cache hit rate, cost estimate, error breakdown
Session drill-down	Built	Tool call timeline, duration, tokens, status per call
Audit log export	Built	Date range, action filter, CSV export
Notifications	Built	Bell with unread count, type-specific icons, filter tabs
Settings	Built	Profile, notification prefs, branding (tenant-admin)
Custom branding	Built	Tenant logo, title, color
Tenant-locked	Built	Cannot see other tenants' data, ever

Security

Feature	Status	Details
JWT authentication	Built	Zitadel OIDC, validated at L1 + L2 independently
JWKS caching	Built	L2: 5-min, L3: 30-min TTL, offline validation
API key auth	Built	key-auth plugin, scoped per tenant
Master key fallback	Built	Service-to-service calls bypass JWT when needed
PII redaction	Built	Pre-LLM: SSN, email, credit card, phone, IP
Prompt guards	Built	Regex + webhook pipeline, Reject/Mask actions
Auth header scrubbing	Built	Authorization/x-api-key stripped before ClickHouse
Tool allowlists	Built	Deny by default, CEL expressions, per-tenant
Tool poisoning detection	Built	Fingerprinting, naming collision (409)
HITL gates	Built	Human approval for high-risk tool calls
Blast radius controls	Built	Session budgets, depth limits, loop detection
IP restriction	Built	Per-tenant CIDR allowlist
SQL injection protection	Built	Parameterized queries, pre-merge scan
XSS protection	Built	React escaping, pre-merge scan
Secret management	Built	KeySource enum: EnvVar, Vault, K8s, AWS SM, Azure KV (stubs)
License key system	Built	JWT-signed, offline validation, tier gating
TLS everywhere	Built	cert-manager, inter-service TLS, self-signed CA for dev
MCP OAuth	Built	RFC 9728/8414, PKCE, 3 validation modes

Observability

Feature	Status	Details
Prometheus metrics	Built	All 3 layers export metrics: requests, latency, errors, cache, tokens
Grafana dashboards	Built	L1 (4 panels), L2 (10 panels), L3 (7 panels)
Jaeger distributed tracing	Built	Cross-layer L1 → L2 → L3 span tree, OTel export
ClickHouse analytics	Built	request_log, ai_request_log, agent_audit_log — partitioned by month
ClickHouse TTL	Built	90d request logs, 365d AI usage, no TTL audit logs
Buffer engine	Built	Buffer → MergeTree for high-write tables
Real-time health monitoring	Built	Upstream status, dependency map, alert config
LLM observability webhooks	Built	Batched metadata export (Langfuse/LangSmith compatible)
Audit trail	Built	Every tool call, A2A hop, session event logged with tenant_id
CSV export	Built	request_log, ai_request_log, agent_audit_log — ClickHouse FORMAT CSVWithNames

Enterprise & Compliance

Feature	Status	Details
License key & feature gates	Built	Community / Pro / Enterprise / Trial tiers
SSO federation	Built	Okta, Microsoft Entra ID, Google Workspace via Zitadel OIDC broker
SCIM provisioning	Built	Identity source detection, role assignment API
Multi-org Zitadel	Built	Dedicated organization per enterprise tenant
HIPAA compliance mapping	Built	`docs/compliance/hipaa-mapping.md`, controls → features
Air-gap deployment	Built	All services from container images, zero internet dependency
Performance benchmarks	Built	Documented methodology, L1/L2/L3 numbers
Dependency audit	Built	cargo audit + npm audit clean
Backup/restore runbook	Built	etcd, ClickHouse, Redis — RTO/RPO documented
Disaster recovery	Built	`docs/operations/disaster-recovery.md`
Horizontal autoscaling	Built	HPA for APISIX, AI Gateway, Agent Gateway
Canary deployments	Built	Traffic splitting, blue-green documented
Webhook system	Built	6 event types, retry with backoff, delivery log

Infrastructure & Deployment

Feature	Status	Details
Docker Compose (local)	Built	All 15 services, single `docker compose up -d`
Kubernetes manifests	Built	Namespace, Deployments, Services, Secrets, Ingress
Helm chart	Built	`infra/helm/gatez/` with configurable values.yaml
Terraform	Built	`infra/terraform/` for cloud provisioning
Caddy reverse proxy	Built	Auto-TLS, subdomain routing template
Environment templates	Built	`.env.local`, `.env.staging`, `.env.production`
Environment detection	Built	`GATEZ_ENV` — refuses default passwords in production
Hot config reload	Built	L2 + L3 `POST /admin/reload`, no restart needed
Kong migration tools	Built	CLI translator, Python parser, plugin map, migration guide
gRPC + WebSocket	Built	APISIX plugins enabled, routes configured

Testing

Feature	Status	Details
L2 Rust unit tests	Built	78 tests (PII, cache, semantic cache, config, providers, logging)
L3 Rust unit tests	Built	106 tests (sessions, security, A2A, MCP, audit, CEL)
Cross-layer E2E	Built	16 tests (L1→L2→L3 health, sessions, tools, metrics)
Enterprise test suite	Built	213 scenarios (isolation, auth, boundary, concurrency)
Playwright UI E2E	Built	208 specs across both portals
Pre-merge gate	Built	10-section security/quality scan (secrets, SQL injection, auth, tenant isolation)
Smoke test	Built	`scripts/smoke-test.sh` — all services healthy
Full test runner	Built	`scripts/test-all.sh` — runs all suites in sequence
Performance benchmarks	Built	wrk/k6 based, documented methodology
Chaos engineering	Built	Service stop/start resilience tests

Not Yet Built (Planned)

Feature	Priority	Details
Kubernetes Gateway API	Medium	GatewayClass, Gateway, HTTPRoute CRDs
Vault secret resolver	Medium	HTTP API with token auth + cache
K8s Secret resolver	Medium	kube crate, service account auth
AWS Secrets Manager resolver	Medium	aws-sdk-secretsmanager crate
Usage metering & billing	Medium	Stripe integration, materialized views
SOC 2 Type II	Low	3-6 month audit process
L1 response PII scrubbing	Low	APISIX body_filter plugin, opt-in per route
Per-tenant provider preferences	Low	Tenant-specific model routing
Per-tenant guard configuration	Low	Custom prompt guard rules per tenant
Full stdio bidirectional JSON-RPC	Low	Background stdout reader with reconnection
Full SSE streaming with reconnection	Low	Session pinning for stateful MCP servers
File watcher config reload	Low	`notify` crate for automatic reload

Gatez Feature Matrix ​

API Gateway (Layer 1 — APISIX + Lua) ​

AI Gateway (Layer 2 — Custom Rust, axum + tokio) ​

Agent Gateway (Layer 3 — Custom Rust, axum + tokio) ​

Multi-Tenancy ​

Control Plane — Operator Portal ​

Control Plane — Developer Portal ​

Security ​

Observability ​

Enterprise & Compliance ​

Infrastructure & Deployment ​

Testing ​

Not Yet Built (Planned) ​

Gatez Feature Matrix

API Gateway (Layer 1 — APISIX + Lua)

AI Gateway (Layer 2 — Custom Rust, axum + tokio)

Agent Gateway (Layer 3 — Custom Rust, axum + tokio)

Multi-Tenancy

Control Plane — Operator Portal

Control Plane — Developer Portal

Security

Observability

Enterprise & Compliance

Infrastructure & Deployment

Testing

Not Yet Built (Planned)