Appearance
Gatez Gateway Platform — Performance Benchmark Report
Test Environments
AWS Production (2026-04-09)
- Instance type: AWS m6a.xlarge (spot)
- CPU: 4 vCPU (AMD EPYC, 3rd gen)
- Memory: 16 GB RAM
- Cost: ~$25-34/month (spot pricing)
- Region: ap-south-1 (Mumbai)
- OS: Ubuntu 22.04 LTS
- Docker: 19 containers running (all services + observability stack)
- Tool: wrk 4.2.0 (10 threads, 200-500 connections, 30s duration)
- Network: All tests run localhost on the VM (no internet RTT)
Local Development
- Hardware: Apple Silicon M-series (Docker Desktop), 16GB RAM
- Docker: Compose with 15 services (no resource limits in dev)
- Note: Docker Desktop on macOS throttles I/O significantly — expect 50-100x lower throughput than bare-metal Linux
Layer 1 — APISIX API Gateway
AWS Production Results
Gateway-only (no upstream proxy — measures APISIX + plugin overhead)
| Metric | Result | Target | Status |
|---|---|---|---|
| Throughput (bare, no plugins) | 46,665 req/s | 50,000 TPS | 93% of target |
| Throughput (all plugins active) | 43,011 req/s | 50,000 TPS | 86% of target |
| P50 latency | 5.25ms | <10ms | PASS |
| P99 latency | ~15ms | <50ms | PASS |
| Plugin overhead | <8% | <10% | PASS |
| Error rate | 0% | <0.01% | PASS |
Test: wrk -t10 -c200 -d30s http://APISIX_IP:9080/ — direct to APISIX container, bypassing Caddy TLS.
Plugins active: tenant-rate-limit (Redis), key-auth, clickhouse-logger, prometheus, request-id, consumer-restriction, response-pii-scrub.
Full stack (APISIX → upstream → response)
| Metric | Result | Bottleneck | Notes |
|---|---|---|---|
| APISIX → mock-backend (200) | 961 req/s | mock-backend | httpbin Python caps at ~1,835 req/s |
| APISIX → mock-backend + key-auth | 848 req/s | mock-backend | +key-auth adds negligible overhead |
| APISIX → mock-backend (500 conn) | 951 req/s | mock-backend | Flat — confirms upstream saturation |
| Mock-backend direct (no APISIX) | 1,835 req/s | — | Python httpbin baseline |
Test: wrk -t10 -c200 -d30s -H "X-Tenant-ID: retail" http://APISIX_IP:9080/smoke/get
Key insight: With real production backends (Go/Rust/Java at 10k+ req/s), the gateway will NOT be the bottleneck. APISIX processes 43k req/s with full plugin pipeline — throughput is limited only by upstream service capacity.
To reach 50k TPS gateway-only: Scale to m6a.2xlarge (8 vCPU, $50/mo spot) or add a second APISIX node behind a load balancer.
Local Development Results (baseline)
| Metric | Result | Notes |
|---|---|---|
| Throughput | 479 req/s | Docker Desktop throttles I/O — 90x slower than AWS |
| P50 latency | 19.2ms | |
| P99 latency | 55.9ms |
Test: ./scripts/load-test.sh 20 200 — 20 concurrent workers, 200 requests.
Local Development Results (baseline)
| Metric | Result | Notes |
|---|---|---|
| Throughput | 479 req/s | Docker Desktop throttles I/O — 96x slower than AWS |
| P50 latency | 19.2ms | |
| P99 latency | 55.9ms |
Test: ./scripts/load-test.sh 20 200 — 20 concurrent workers, 200 requests.
Layer 2 — Rust AI Gateway
| Metric | Result | Target | Notes |
|---|---|---|---|
| Cache-hit throughput | 599 req/s | 2,000+ req/s | Redis exact-match cache path |
| Cache-hit avg latency | 18.43ms | <5ms | Includes tenant extraction + budget check |
| Cache-hit P99 | 61.23ms | <20ms | |
| Error rate | 0% | <0.01% |
Test: ./scripts/load-test-l2.sh 30 600 — 30 concurrent workers, 600 requests, pre-warmed Redis cache.
Cache-miss path: Depends entirely on LLM provider latency (100ms-2000ms). Gateway overhead adds <20ms.
Production estimate: On 2-core instance, expect 2,000-5,000 cache-hit req/s. Rust's zero-copy SSE streaming handles 10,000+ concurrent connections.
Layer 3 — Rust Agent Gateway
| Metric | Result | Notes |
|---|---|---|
| Session creation | <5ms | Redis SET with TTL |
| Tool call (local) | <10ms | MCP JSON-RPC forwarding |
| A2A send | <5ms | Redis agent lookup + HTTP forward |
| HITL creation | <3ms | Redis LPUSH |
12/12 integration tests pass consistently.
ClickHouse Write Throughput
| Table | Engine | Write Pattern | Estimated Capacity |
|---|---|---|---|
| request_log | Buffer → MergeTree | Async batch (16 buffers, 10s flush) | 50,000+ writes/sec |
| ai_request_log | Buffer → MergeTree | Async batch | 10,000+ writes/sec |
| agent_audit_log | Buffer → MergeTree | Async fire-and-forget | 10,000+ writes/sec |
Redis Performance
| Operation | Latency | Notes |
|---|---|---|
| Rate limit check (Lua script) | <1ms | Atomic ZSET sliding window |
| Cache GET | <1ms | Exact-match lookup |
| Budget check | <1ms | Simple GET |
| Session GET | <1ms | JSON deserialize from Redis |
Single Redis instance sufficient for 50,000 TPS. Connection pooling via keepalive in Lua plugins and ConnectionManager in Rust.
Methodology
- All tests run on local Docker Desktop (Apple Silicon, 16GB RAM)
- Docker resource limits: APISIX 2 CPU / 512MB, others default
- Services warm for 60s before test start
- Rate limit set to 1,000,000 for load test tenant
- Cache pre-warmed for L2 cache-hit tests
- Results averaged over 3 runs, outliers discarded
- k6 scripts for reproducible benchmarks with threshold validation
k6 Benchmark Scripts (Recommended)
bash
# Install k6
brew install k6 # macOS
# or: https://k6.io/docs/getting-started/installation/
# Start stack
docker compose up -d
./scripts/zitadel-setup.sh
./scripts/setup-routes.sh
./scripts/setup-key-auth.sh
# Wait 60s for warm-up
# L1: APISIX throughput + latency
./scripts/benchmark-l1.sh 50 30s
# L2: AI Gateway cache-hit path
./scripts/benchmark-l2.sh 30 30s
# L3: Agent Gateway session creation
./scripts/benchmark-l3.sh 20 30sLegacy curl-based scripts (no k6 required)
bash
./scripts/load-test.sh 50 1000
./scripts/load-test-l2.sh 30 600Pass/Fail Thresholds (k6)
| Layer | Metric | Threshold |
|---|---|---|
| L1 | P99 latency | < 100ms |
| L1 | Error rate | < 1% |
| L2 | Cache-hit P99 | < 50ms |
| L2 | Error rate | < 1% |
| L3 | Session create P99 | < 50ms |
| L3 | Error rate | < 5% |
Production Sizing Recommendations
| Service | CPU | Memory | Replicas | Notes |
|---|---|---|---|---|
| APISIX | 2 cores | 512MB | 2-4 | Scale horizontally for TPS |
| AI Gateway | 1 core | 512MB | 2-4 | Scale for concurrent LLM calls |
| Agent Gateway | 1 core | 512MB | 2 | Scale for concurrent sessions |
| Control Plane API | 1 core | 256MB | 2 | Low traffic, HA only |
| Redis | 1 core | 512MB | 1 (Sentinel for HA) | Single instance handles 100K ops/sec |
| ClickHouse | 2 cores | 2GB | 1-3 | Scale for query complexity |
| etcd | 1 core | 256MB | 3 | Must be odd number for quorum |
| Zitadel | 2 cores | 1GB | 2 | Scale for concurrent logins |