ClickHouse Schema Reference

All tables live in the gateway database. Schema is initialized by infra/clickhouse/init.sql on first startup.

ClickHouse is accessible at http://localhost:8123 (HTTP interface) and clickhouse:9000 (native protocol, used by Grafana).

Tables

gateway.request_log_raw

L1 request log storage table. Receives data from the Buffer table. This is where queries should read from for historical analysis.

sql

CREATE TABLE IF NOT EXISTS gateway.request_log_raw (
  timestamp DateTime64(3),
  tenant_id String,
  route_id String,
  method String,
  path String,
  status_code UInt16,
  latency_ms Float32,
  request_size UInt32,
  response_size UInt32,
  client_ip String,
  user_agent String,
  trace_id String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, timestamp)
TTL toDateTime(timestamp) + INTERVAL 90 DAY;

Column	Type	Description
`timestamp`	DateTime64(3)	Request timestamp (millisecond precision)
`tenant_id`	String	Tenant identifier
`route_id`	String	APISIX route ID
`method`	String	HTTP method (GET, POST, etc.)
`path`	String	Request URI path
`status_code`	UInt16	HTTP response status code
`latency_ms`	Float32	End-to-end latency in milliseconds
`request_size`	UInt32	Request body size in bytes
`response_size`	UInt32	Response body size in bytes
`client_ip`	String	Client IP address
`user_agent`	String	User-Agent header
`trace_id`	String	Distributed trace ID (from X-Request-ID)

Engine: MergeTree
Partition: toYYYYMM(timestamp) (monthly)
Order: (tenant_id, timestamp) -- tenant-first for efficient filtered queries
TTL: 90 days

gateway.request_log

Buffer table in front of request_log_raw for high-throughput async writes. The clickhouse-logger plugin writes to this table. Buffers are flushed to the underlying MergeTree based on the configured thresholds.

sql

CREATE TABLE IF NOT EXISTS gateway.request_log (
  timestamp DateTime64(3),
  tenant_id String,
  route_id String,
  method String,
  path String,
  status_code UInt16,
  latency_ms Float32,
  request_size UInt32,
  response_size UInt32,
  client_ip String,
  user_agent String,
  trace_id String
) ENGINE = Buffer(gateway, request_log_raw, 16, 10, 100, 10000, 100000, 1000000, 10000000);

Engine: Buffer (16 buffers, flush thresholds: 10-100s time, 10K-100K rows, 1M-10M bytes)
Target: gateway.request_log_raw

:::note The Buffer engine is required to handle 50K TPS write throughput. Never write directly to request_log_raw from the hot path. :::

gateway.ai_request_log

L2 AI Gateway request log. Tracks LLM provider calls, token usage, cache performance, and PII detection events.

sql

CREATE TABLE IF NOT EXISTS gateway.ai_request_log (
  timestamp DateTime64(3),
  tenant_id String,
  model String,
  provider String,
  prompt_tokens UInt32,
  completion_tokens UInt32,
  total_tokens UInt32,
  latency_ms Float32,
  cache_hit Bool,
  pii_detected Bool,
  trace_id String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, timestamp)
TTL toDateTime(timestamp) + INTERVAL 365 DAY;

Column	Type	Description
`timestamp`	DateTime64(3)	Request timestamp
`tenant_id`	String	Tenant identifier
`model`	String	LLM model name (e.g., `gpt-4o`, `claude-sonnet`)
`provider`	String	LLM provider (e.g., `openai`, `anthropic`)
`prompt_tokens`	UInt32	Input token count
`completion_tokens`	UInt32	Output token count
`total_tokens`	UInt32	Total tokens consumed
`latency_ms`	Float32	Provider response latency
`cache_hit`	Bool	Whether the response came from cache
`pii_detected`	Bool	Whether PII was detected in the request
`trace_id`	String	Distributed trace ID

Engine: MergeTree
Partition: toYYYYMM(timestamp) (monthly)
Order: (tenant_id, timestamp)
TTL: 365 days

gateway.agent_audit_log

L3 Agent Gateway audit log. Records all agent tool calls, A2A hops, and session actions. No TTL -- retained indefinitely for compliance.

sql

CREATE TABLE IF NOT EXISTS gateway.agent_audit_log (
  timestamp DateTime64(3),
  tenant_id String,
  session_id String,
  agent_id String,
  tool_name String,
  action String,
  status String,
  input_hash String,
  output_hash String,
  trace_id String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, timestamp);

Column	Type	Description
`timestamp`	DateTime64(3)	Event timestamp
`tenant_id`	String	Tenant identifier
`session_id`	String	Agent session ID
`agent_id`	String	Agent identifier
`tool_name`	String	MCP tool name invoked
`action`	String	Action type (e.g., `tool_call`, `a2a_send`)
`status`	String	Outcome (e.g., `success`, `denied`, `error`)
`input_hash`	String	SHA-256 hash of tool input (content not stored)
`output_hash`	String	SHA-256 hash of tool output (content not stored)
`trace_id`	String	Distributed trace ID

Engine: MergeTree
Partition: toYYYYMM(timestamp) (monthly)
Order: (tenant_id, timestamp)
TTL: None (permanent retention for audit compliance)

DANGER

Never add a TTL to the agent audit log. Audit records must be retained indefinitely per security policy.

Materialized Views

gateway.request_rate_per_minute

Pre-aggregated request rates per tenant per minute. Used by dashboards to avoid expensive query-time aggregation.

sql

CREATE MATERIALIZED VIEW IF NOT EXISTS gateway.request_rate_per_minute
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (tenant_id, minute)
AS SELECT
  tenant_id,
  toStartOfMinute(timestamp) AS minute,
  count() AS request_count,
  countIf(status_code >= 400) AS error_count,
  avg(latency_ms) AS avg_latency_ms,
  quantile(0.99)(latency_ms) AS p99_latency_ms
FROM gateway.request_log_raw
GROUP BY tenant_id, minute;

Column	Type	Description
`tenant_id`	String	Tenant identifier
`minute`	DateTime	Start of the minute bucket
`request_count`	UInt64	Total requests in this minute
`error_count`	UInt64	Requests with status >= 400
`avg_latency_ms`	Float64	Average latency
`p99_latency_ms`	Float64	99th percentile latency

Engine: SummingMergeTree
Source: gateway.request_log_raw

gateway.ai_token_usage_hourly

Pre-aggregated AI token usage per tenant per hour, broken down by model and provider. Powers the AI usage dashboards.

sql

CREATE MATERIALIZED VIEW IF NOT EXISTS gateway.ai_token_usage_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (tenant_id, hour)
AS SELECT
  tenant_id,
  toStartOfHour(timestamp) AS hour,
  model,
  provider,
  sum(prompt_tokens) AS total_prompt_tokens,
  sum(completion_tokens) AS total_completion_tokens,
  sum(total_tokens) AS total_tokens,
  count() AS request_count,
  countIf(cache_hit = true) AS cache_hits
FROM gateway.ai_request_log
GROUP BY tenant_id, hour, model, provider;

Column	Type	Description
`tenant_id`	String	Tenant identifier
`hour`	DateTime	Start of the hour bucket
`model`	String	LLM model name
`provider`	String	LLM provider
`total_prompt_tokens`	UInt64	Sum of prompt tokens
`total_completion_tokens`	UInt64	Sum of completion tokens
`total_tokens`	UInt64	Sum of total tokens
`request_count`	UInt64	Number of requests
`cache_hits`	UInt64	Number of cache hits

Engine: SummingMergeTree
Source: gateway.ai_request_log

Query Examples

Request volume per tenant (last 24h)

sql

SELECT
  tenant_id,
  sum(request_count) AS total_requests,
  sum(error_count) AS total_errors,
  avg(avg_latency_ms) AS avg_latency,
  max(p99_latency_ms) AS max_p99
FROM gateway.request_rate_per_minute
WHERE minute >= now() - INTERVAL 24 HOUR
GROUP BY tenant_id
ORDER BY total_requests DESC;

Token usage by model (last 7 days)

sql

SELECT
  model,
  provider,
  sum(total_tokens) AS tokens,
  sum(request_count) AS requests,
  sum(cache_hits) AS cache_hits,
  round(sum(cache_hits) / sum(request_count) * 100, 1) AS cache_hit_rate_pct
FROM gateway.ai_token_usage_hourly
WHERE hour >= now() - INTERVAL 7 DAY
GROUP BY model, provider
ORDER BY tokens DESC;

Error rate spike detection for a tenant

sql

SELECT
  minute,
  request_count,
  error_count,
  round(error_count / request_count * 100, 2) AS error_rate_pct,
  p99_latency_ms
FROM gateway.request_rate_per_minute
WHERE tenant_id = 'tenant-alpha'
  AND minute >= now() - INTERVAL 1 HOUR
ORDER BY minute;

Agent audit trail for a session

sql

SELECT
  timestamp,
  tool_name,
  action,
  status,
  agent_id
FROM gateway.agent_audit_log
WHERE session_id = 'your-session-id'
ORDER BY timestamp;

Top consumers by token usage (current month)

sql

SELECT
  tenant_id,
  sum(total_tokens) AS total_tokens,
  sum(total_prompt_tokens) AS prompt_tokens,
  sum(total_completion_tokens) AS completion_tokens,
  sum(request_count) AS requests
FROM gateway.ai_token_usage_hourly
WHERE hour >= toStartOfMonth(now())
GROUP BY tenant_id
ORDER BY total_tokens DESC
LIMIT 20;

Schema Rules

WARNING

Always use ALTER TABLE ... ADD COLUMN for schema evolution -- never recreate tables.
New columns must have defaults.
Never use SELECT * -- always name columns explicitly.
Tenant isolation is enforced via row-level WHERE tenant_id = ... filters, not separate tables.

ClickHouse Schema Reference ​

Tables ​

gateway.request_log_raw ​

gateway.request_log ​

gateway.ai_request_log ​

gateway.agent_audit_log ​

Materialized Views ​

gateway.request_rate_per_minute ​

gateway.ai_token_usage_hourly ​

Query Examples ​

Request volume per tenant (last 24h) ​

Token usage by model (last 7 days) ​

Error rate spike detection for a tenant ​

Agent audit trail for a session ​

Top consumers by token usage (current month) ​

Schema Rules ​