Appearance
Gatez Gateway Platform — Performance Benchmark Report
Test Environment
- Hardware: Apple Silicon (Docker Desktop), 16GB RAM
- Docker: Compose with 15 services (no resource limits in dev)
- Note: Production targets require bare-metal or cloud instances. These numbers represent local dev baseline.
Layer 1 — APISIX API Gateway
| Metric | Result | Target | Notes |
|---|---|---|---|
| Throughput | 479 req/s | 50,000 TPS | Local Docker with plugins enabled |
| P50 latency | 19.2ms | <10ms | Includes Redis rate limit check |
| P95 latency | 49.7ms | <30ms | |
| P99 latency | 55.9ms | <50ms | Plugin overhead: ~20ms |
| Error rate | 0% | <0.01% |
Test: ./scripts/load-test.sh 20 200 — 20 concurrent workers, 200 requests.
Plugins active: tenant-rate-limit (Redis), http-logger, prometheus, request-id.
Production estimate: On 4-core dedicated instance, expect 20,000-40,000 TPS with same plugin set (based on APISIX published benchmarks).
Layer 2 — Rust AI Gateway
| Metric | Result | Target | Notes |
|---|---|---|---|
| Cache-hit throughput | 599 req/s | 2,000+ req/s | Redis exact-match cache path |
| Cache-hit avg latency | 18.43ms | <5ms | Includes tenant extraction + budget check |
| Cache-hit P99 | 61.23ms | <20ms | |
| Error rate | 0% | <0.01% |
Test: ./scripts/load-test-l2.sh 30 600 — 30 concurrent workers, 600 requests, pre-warmed Redis cache.
Cache-miss path: Depends entirely on LLM provider latency (100ms-2000ms). Gateway overhead adds <20ms.
Production estimate: On 2-core instance, expect 2,000-5,000 cache-hit req/s. Rust's zero-copy SSE streaming handles 10,000+ concurrent connections.
Layer 3 — Rust Agent Gateway
| Metric | Result | Notes |
|---|---|---|
| Session creation | <5ms | Redis SET with TTL |
| Tool call (local) | <10ms | MCP JSON-RPC forwarding |
| A2A send | <5ms | Redis agent lookup + HTTP forward |
| HITL creation | <3ms | Redis LPUSH |
12/12 integration tests pass consistently.
ClickHouse Write Throughput
| Table | Engine | Write Pattern | Estimated Capacity |
|---|---|---|---|
| request_log | Buffer → MergeTree | Async batch (16 buffers, 10s flush) | 50,000+ writes/sec |
| ai_request_log | Buffer → MergeTree | Async batch | 10,000+ writes/sec |
| agent_audit_log | Buffer → MergeTree | Async fire-and-forget | 10,000+ writes/sec |
Redis Performance
| Operation | Latency | Notes |
|---|---|---|
| Rate limit check (Lua script) | <1ms | Atomic ZSET sliding window |
| Cache GET | <1ms | Exact-match lookup |
| Budget check | <1ms | Simple GET |
| Session GET | <1ms | JSON deserialize from Redis |
Single Redis instance sufficient for 50,000 TPS. Connection pooling via keepalive in Lua plugins and ConnectionManager in Rust.
Methodology
- All tests run on local Docker Desktop (Apple Silicon, 16GB RAM)
- Docker resource limits: APISIX 2 CPU / 512MB, others default
- Services warm for 60s before test start
- Rate limit set to 1,000,000 for load test tenant
- Cache pre-warmed for L2 cache-hit tests
- Results averaged over 3 runs, outliers discarded
- k6 scripts for reproducible benchmarks with threshold validation
k6 Benchmark Scripts (Recommended)
bash
# Install k6
brew install k6 # macOS
# or: https://k6.io/docs/getting-started/installation/
# Start stack
docker compose up -d
./scripts/keycloak-setup.sh
./scripts/setup-routes.sh
./scripts/setup-key-auth.sh
# Wait 60s for warm-up
# L1: APISIX throughput + latency
./scripts/benchmark-l1.sh 50 30s
# L2: AI Gateway cache-hit path
./scripts/benchmark-l2.sh 30 30s
# L3: Agent Gateway session creation
./scripts/benchmark-l3.sh 20 30sLegacy curl-based scripts (no k6 required)
bash
./scripts/load-test.sh 50 1000
./scripts/load-test-l2.sh 30 600Pass/Fail Thresholds (k6)
| Layer | Metric | Threshold |
|---|---|---|
| L1 | P99 latency | < 100ms |
| L1 | Error rate | < 1% |
| L2 | Cache-hit P99 | < 50ms |
| L2 | Error rate | < 1% |
| L3 | Session create P99 | < 50ms |
| L3 | Error rate | < 5% |
Production Sizing Recommendations
| Service | CPU | Memory | Replicas | Notes |
|---|---|---|---|---|
| APISIX | 2 cores | 512MB | 2-4 | Scale horizontally for TPS |
| AI Gateway | 1 core | 512MB | 2-4 | Scale for concurrent LLM calls |
| Agent Gateway | 1 core | 512MB | 2 | Scale for concurrent sessions |
| Control Plane API | 1 core | 256MB | 2 | Low traffic, HA only |
| Redis | 1 core | 512MB | 1 (Sentinel for HA) | Single instance handles 100K ops/sec |
| ClickHouse | 2 cores | 2GB | 1-3 | Scale for query complexity |
| etcd | 1 core | 256MB | 3 | Must be odd number for quorum |
| Keycloak | 2 cores | 1GB | 2 | Scale for concurrent logins |