Skip to content

ADR-002: ClickHouse for Request Logging and Analytics

Status

Accepted — 2026-03-21

Context

The platform needs to ingest and query request logs at 50,000 writes/second with per-tenant analytics, retention policies, and dashboard-friendly aggregation.

Options evaluated

  1. ClickHouse — Column-oriented OLAP, MergeTree engine, excellent write throughput
  2. Elasticsearch — Full-text search, Kibana dashboards, widely adopted
  3. Grafana Loki — Log aggregation, label-based queries, lower resource usage

Decision

Use ClickHouse for all structured request logging and analytics.

Rationale

  • Write throughput: ClickHouse handles 50k+ inserts/sec on modest hardware via Buffer engine. Elasticsearch typically needs 5-10x more resources for comparable write throughput.
  • Storage efficiency: Columnar compression achieves 10-20x compression ratios on structured log data vs Elasticsearch's inverted index overhead.
  • SQL interface: Native SQL queries are more accessible than Elasticsearch DSL for analytics.
  • Materialized views: Pre-aggregated dashboards without query-time overhead — critical for real-time tenant metrics.
  • TTL support: Native per-table TTL with automatic data expiration.

Ingestion strategy

  • Buffer engine: Buffer(gateway, request_log_raw, 16, ...) absorbs bursts and flushes in batches
  • Partition by month: PARTITION BY toYYYYMM(timestamp) for efficient TTL and query pruning
  • Order by tenant: ORDER BY (tenant_id, timestamp) for fast per-tenant queries

Retention

  • Request logs: 90 days
  • AI usage logs: 365 days
  • Audit logs: no TTL (regulatory compliance)

Consequences

  • Need Buffer engine tuning under load to prevent memory pressure
  • Grafana requires ClickHouse datasource plugin
  • No full-text search capability (not needed for structured logs)

Enterprise API + AI + Agent Gateway