Metrics Reference

All Gatez services expose Prometheus metrics. Metrics are scraped by Prometheus (port 9090) at a 15-second interval.

Scrape Targets

Job	Endpoint	Port	Metrics Path
`apisix`	`apisix:9091`	9091	`/apisix/prometheus/metrics`
`ai-gateway`	`ai-gateway:4000`	4000	`/metrics`
`otel-collector`	`otel-collector:8889`	8889	`/metrics`
`clickhouse`	`clickhouse:9363`	9363	`/metrics`

:::note The Agent Gateway (L3) metrics are exported via the OTel Collector. Metrics from L2 and L3 Rust services flow through OTel and are exposed by the collector at port 8889 with the gateway_ namespace prefix. :::

L1 -- APISIX Metrics (Standard Prometheus Plugin)

APISIX exposes metrics via its built-in prometheus plugin at /apisix/prometheus/metrics on port 9091.

Metric	Type	Labels	Description
`apisix_http_status`	Counter	`code`, `route`, `matched_uri`, `matched_host`, `service`, `consumer`, `node`	HTTP response status code counts
`apisix_http_latency_bucket`	Histogram	`type` (`apisix`, `upstream`, `request`), `route`, `service`, `consumer`, `node`	Request latency distribution (seconds)
`apisix_http_latency_sum`	Histogram	(same as above)	Sum of request latencies
`apisix_http_latency_count`	Histogram	(same as above)	Count of requests
`apisix_bandwidth`	Counter	`type` (`ingress`, `egress`), `route`, `service`, `consumer`, `node`	Bytes transferred
`apisix_upstream_status`	Counter	`code`, `route`, `service`, `consumer`, `node`	Upstream response status codes
`apisix_http_requests_total`	Counter	`route`, `service`, `consumer`	Total HTTP requests processed
`apisix_nginx_http_current_connections`	Gauge	`state` (`active`, `reading`, `writing`, `waiting`)	Current Nginx connection states
`apisix_etcd_modify_indexes`	Gauge	`key`	etcd modification index by config key
`apisix_node_info`	Gauge	`hostname`	Node information
`apisix_shared_dict_capacity_bytes`	Gauge	`name`	Shared dictionary capacity
`apisix_shared_dict_free_space_bytes`	Gauge	`name`	Shared dictionary free space

L2 -- AI Gateway Metrics

Defined in layers/ai-gateway/src/middleware.rs. Exposed at http://ai-gateway:4000/metrics.

Metric	Type	Labels	Description
`ai_gateway_requests_total`	Counter	--	Total AI gateway requests processed
`ai_gateway_cache_hits_total`	Counter	--	Total exact-match cache hits (Redis)
`ai_gateway_cache_misses_total`	Counter	--	Total cache misses
`ai_gateway_latency_seconds`	Histogram	--	Request latency distribution. Buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
`ai_gateway_tokens_total`	Counter	--	Total tokens consumed across all tenants
`ai_gateway_active_requests`	Gauge	--	Currently in-flight requests
`ai_gateway_pii_detected_total`	Counter	--	Total PII detection events (Presidio)
`ai_gateway_budget_exceeded_total`	Counter	--	Total requests rejected due to token budget exhaustion

L3 -- Agent Gateway Metrics

Defined in layers/agent-gateway/src/metrics.rs. Exposed at http://agent-gateway:5001/metrics.

Metric	Type	Labels	Description
`agent_gw_sessions_total`	Counter	--	Total agent sessions created
`agent_gw_sessions_active`	Gauge	--	Currently active agent sessions
`agent_gw_tool_calls_total`	Counter	--	Total MCP tool calls executed
`agent_gw_tool_calls_denied`	Counter	--	Tool calls denied by policy (allowlist/denylist)
`agent_gw_a2a_hops_total`	Counter	--	Total A2A delegation hops
`agent_gw_hitl_requests_total`	Counter	--	Total HITL approval requests created
`agent_gw_hitl_approved_total`	Counter	--	HITL requests approved
`agent_gw_hitl_denied_total`	Counter	--	HITL requests denied
`agent_gw_tool_latency_seconds`	Histogram	--	Tool call latency distribution. Buckets: 1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, 5s, 10s, 30s
`agent_gw_budget_remaining`	Gauge	--	Current remaining budget
`agent_gw_tool_poisoning_detected`	Counter	--	Tool poisoning detection events (fingerprint mismatch)

OTel Collector Metrics

The OTel Collector re-exports metrics received via OTLP under the gateway_ namespace (configured in infra/otel/config.yaml).

Setting	Value
Namespace	`gateway`
Endpoint	`0.0.0.0:8889`
Resource-to-telemetry	Enabled

PromQL Examples

L1: Request rate by route (last 5 minutes)

promql

sum(rate(apisix_http_status[5m])) by (route, code)

L1: P99 latency per route

promql

histogram_quantile(0.99,
  sum(rate(apisix_http_latency_bucket{type="request"}[5m])) by (le, route)
)

L1: Error rate (4xx + 5xx)

promql

sum(rate(apisix_http_status{code=~"4..|5.."}[5m]))
/
sum(rate(apisix_http_status[5m]))

L1: Current active connections

promql

apisix_nginx_http_current_connections{state="active"}

L2: AI Gateway request rate

promql

rate(ai_gateway_requests_total[5m])

L2: Cache hit ratio

promql

rate(ai_gateway_cache_hits_total[5m])
/
(rate(ai_gateway_cache_hits_total[5m]) + rate(ai_gateway_cache_misses_total[5m]))

L2: P99 AI Gateway latency

promql

histogram_quantile(0.99,
  sum(rate(ai_gateway_latency_seconds_bucket[5m])) by (le)
)

L2: Token consumption rate

promql

rate(ai_gateway_tokens_total[1h])

L2: Budget exceeded events

promql

rate(ai_gateway_budget_exceeded_total[5m])

L3: Active sessions

promql

agent_gw_sessions_active

L3: Tool call throughput

promql

rate(agent_gw_tool_calls_total[5m])

L3: Tool call denial rate

promql

rate(agent_gw_tool_calls_denied[5m])
/
rate(agent_gw_tool_calls_total[5m])

L3: HITL approval rate

promql

rate(agent_gw_hitl_approved_total[5m])
/
(rate(agent_gw_hitl_approved_total[5m]) + rate(agent_gw_hitl_denied_total[5m]))

L3: P99 tool call latency

promql

histogram_quantile(0.99,
  sum(rate(agent_gw_tool_latency_seconds_bucket[5m])) by (le)
)

L3: Tool poisoning alerts

promql

increase(agent_gw_tool_poisoning_detected[1h]) > 0

Grafana Datasources

Configured in infra/grafana/provisioning/datasources/datasources.yaml:

Datasource	Type	URL	Default
Prometheus	`prometheus`	`http://prometheus:9090`	Yes
Jaeger	`jaeger`	`http://jaeger:16686`	No
ClickHouse	`grafana-clickhouse-datasource`	`clickhouse:9000` (native)	No

Access Grafana at http://localhost:3002 (default credentials: admin / admin).

Metrics Reference ​

Scrape Targets ​

L1 -- APISIX Metrics (Standard Prometheus Plugin) ​

L2 -- AI Gateway Metrics ​

L3 -- Agent Gateway Metrics ​

OTel Collector Metrics ​

PromQL Examples ​

L1: Request rate by route (last 5 minutes) ​

L1: P99 latency per route ​

L1: Error rate (4xx + 5xx) ​

L1: Current active connections ​

L2: AI Gateway request rate ​

L2: Cache hit ratio ​

L2: P99 AI Gateway latency ​

L2: Token consumption rate ​

L2: Budget exceeded events ​

L3: Active sessions ​

L3: Tool call throughput ​

L3: Tool call denial rate ​

L3: HITL approval rate ​

L3: P99 tool call latency ​

L3: Tool poisoning alerts ​

Grafana Datasources ​

Metrics Reference

Scrape Targets

L1 -- APISIX Metrics (Standard Prometheus Plugin)

L2 -- AI Gateway Metrics

L3 -- Agent Gateway Metrics

OTel Collector Metrics

PromQL Examples

L1: Request rate by route (last 5 minutes)

L1: P99 latency per route

L1: Error rate (4xx + 5xx)

L1: Current active connections

L2: AI Gateway request rate

L2: Cache hit ratio

L2: P99 AI Gateway latency

L2: Token consumption rate

L2: Budget exceeded events

L3: Active sessions

L3: Tool call throughput

L3: Tool call denial rate

L3: HITL approval rate

L3: P99 tool call latency

L3: Tool poisoning alerts

Grafana Datasources