Skip to content

Gatez Gateway Platform — Performance Benchmark Report

Test Environments

AWS Production (2026-04-09)

  • Instance type: AWS m6a.xlarge (spot)
  • CPU: 4 vCPU (AMD EPYC, 3rd gen)
  • Memory: 16 GB RAM
  • Cost: ~$25-34/month (spot pricing)
  • Region: ap-south-1 (Mumbai)
  • OS: Ubuntu 22.04 LTS
  • Docker: 19 containers running (all services + observability stack)
  • Tool: wrk 4.2.0 (10 threads, 200-500 connections, 30s duration)
  • Network: All tests run localhost on the VM (no internet RTT)

Local Development

  • Hardware: Apple Silicon M-series (Docker Desktop), 16GB RAM
  • Docker: Compose with 15 services (no resource limits in dev)
  • Note: Docker Desktop on macOS throttles I/O significantly — expect 50-100x lower throughput than bare-metal Linux

Layer 1 — APISIX API Gateway

AWS Production Results

Gateway-only (no upstream proxy — measures APISIX + plugin overhead)

MetricResultTargetStatus
Throughput (bare, no plugins)46,665 req/s50,000 TPS93% of target
Throughput (all plugins active)43,011 req/s50,000 TPS86% of target
P50 latency5.25ms<10msPASS
P99 latency~15ms<50msPASS
Plugin overhead<8%<10%PASS
Error rate0%<0.01%PASS

Test: wrk -t10 -c200 -d30s http://APISIX_IP:9080/ — direct to APISIX container, bypassing Caddy TLS.

Plugins active: tenant-rate-limit (Redis), key-auth, clickhouse-logger, prometheus, request-id, consumer-restriction, response-pii-scrub.

Full stack (APISIX → upstream → response)

MetricResultBottleneckNotes
APISIX → mock-backend (200)961 req/smock-backendhttpbin Python caps at ~1,835 req/s
APISIX → mock-backend + key-auth848 req/smock-backend+key-auth adds negligible overhead
APISIX → mock-backend (500 conn)951 req/smock-backendFlat — confirms upstream saturation
Mock-backend direct (no APISIX)1,835 req/sPython httpbin baseline

Test: wrk -t10 -c200 -d30s -H "X-Tenant-ID: retail" http://APISIX_IP:9080/smoke/get

Key insight: With real production backends (Go/Rust/Java at 10k+ req/s), the gateway will NOT be the bottleneck. APISIX processes 43k req/s with full plugin pipeline — throughput is limited only by upstream service capacity.

To reach 50k TPS gateway-only: Scale to m6a.2xlarge (8 vCPU, $50/mo spot) or add a second APISIX node behind a load balancer.

Local Development Results (baseline)

MetricResultNotes
Throughput479 req/sDocker Desktop throttles I/O — 90x slower than AWS
P50 latency19.2ms
P99 latency55.9ms

Test: ./scripts/load-test.sh 20 200 — 20 concurrent workers, 200 requests.

Local Development Results (baseline)

MetricResultNotes
Throughput479 req/sDocker Desktop throttles I/O — 96x slower than AWS
P50 latency19.2ms
P99 latency55.9ms

Test: ./scripts/load-test.sh 20 200 — 20 concurrent workers, 200 requests.

Layer 2 — Rust AI Gateway

MetricResultTargetNotes
Cache-hit throughput599 req/s2,000+ req/sRedis exact-match cache path
Cache-hit avg latency18.43ms<5msIncludes tenant extraction + budget check
Cache-hit P9961.23ms<20ms
Error rate0%<0.01%

Test: ./scripts/load-test-l2.sh 30 600 — 30 concurrent workers, 600 requests, pre-warmed Redis cache.

Cache-miss path: Depends entirely on LLM provider latency (100ms-2000ms). Gateway overhead adds <20ms.

Production estimate: On 2-core instance, expect 2,000-5,000 cache-hit req/s. Rust's zero-copy SSE streaming handles 10,000+ concurrent connections.

Layer 3 — Rust Agent Gateway

MetricResultNotes
Session creation<5msRedis SET with TTL
Tool call (local)<10msMCP JSON-RPC forwarding
A2A send<5msRedis agent lookup + HTTP forward
HITL creation<3msRedis LPUSH

12/12 integration tests pass consistently.

ClickHouse Write Throughput

TableEngineWrite PatternEstimated Capacity
request_logBuffer → MergeTreeAsync batch (16 buffers, 10s flush)50,000+ writes/sec
ai_request_logBuffer → MergeTreeAsync batch10,000+ writes/sec
agent_audit_logBuffer → MergeTreeAsync fire-and-forget10,000+ writes/sec

Redis Performance

OperationLatencyNotes
Rate limit check (Lua script)<1msAtomic ZSET sliding window
Cache GET<1msExact-match lookup
Budget check<1msSimple GET
Session GET<1msJSON deserialize from Redis

Single Redis instance sufficient for 50,000 TPS. Connection pooling via keepalive in Lua plugins and ConnectionManager in Rust.

Methodology

  1. All tests run on local Docker Desktop (Apple Silicon, 16GB RAM)
  2. Docker resource limits: APISIX 2 CPU / 512MB, others default
  3. Services warm for 60s before test start
  4. Rate limit set to 1,000,000 for load test tenant
  5. Cache pre-warmed for L2 cache-hit tests
  6. Results averaged over 3 runs, outliers discarded
  7. k6 scripts for reproducible benchmarks with threshold validation
bash
# Install k6
brew install k6  # macOS
# or: https://k6.io/docs/getting-started/installation/

# Start stack
docker compose up -d
./scripts/zitadel-setup.sh
./scripts/setup-routes.sh
./scripts/setup-key-auth.sh

# Wait 60s for warm-up

# L1: APISIX throughput + latency
./scripts/benchmark-l1.sh 50 30s

# L2: AI Gateway cache-hit path
./scripts/benchmark-l2.sh 30 30s

# L3: Agent Gateway session creation
./scripts/benchmark-l3.sh 20 30s

Legacy curl-based scripts (no k6 required)

bash
./scripts/load-test.sh 50 1000
./scripts/load-test-l2.sh 30 600

Pass/Fail Thresholds (k6)

LayerMetricThreshold
L1P99 latency< 100ms
L1Error rate< 1%
L2Cache-hit P99< 50ms
L2Error rate< 1%
L3Session create P99< 50ms
L3Error rate< 5%

Production Sizing Recommendations

ServiceCPUMemoryReplicasNotes
APISIX2 cores512MB2-4Scale horizontally for TPS
AI Gateway1 core512MB2-4Scale for concurrent LLM calls
Agent Gateway1 core512MB2Scale for concurrent sessions
Control Plane API1 core256MB2Low traffic, HA only
Redis1 core512MB1 (Sentinel for HA)Single instance handles 100K ops/sec
ClickHouse2 cores2GB1-3Scale for query complexity
etcd1 core256MB3Must be odd number for quorum
Zitadel2 cores1GB2Scale for concurrent logins

Enterprise API + AI + Agent Gateway