16 Jul 2025
ClickHouse for analytics workloads
A third contender emerges
I looked at the performance of SQLite and DuckDB for analytics workloads in this note. DuckDB was clearly superior. In the meantime I’ve built some sort of metrics collection system (project report will follow) with this DB. However, I’ve been itching to create something new that can also handle tracing data well. Naturally, I’ve stumbled across ClickHouse in that context. In this note, I want to apply the same benchmark to CH and see how it compares.
Methodology
Read the previous note to learn how the data is generated
and what the different use cases here are. For ClickHouse I’ve gone a bit off the rails
though, so I better explain what’s going on here: Instead of crafting a specific table
for the data I’m ingesting here, I’m actually using the default otel_traces
table
as generated by the OpenTelemetry Collector.
And instead of ingesting data into ClickHouse via the Rust client, I’m simply emitting
a tracing span with the relevant data as span attributes and letting the tracing
machinery batch send this data to the collector, which in turn writes it to ClickHouse.
I know, I know, this sounds very hacky. But it would be quite nice to just be able to add some instrumentation to any system and have its data make it to a DB that is efficient at pulling out relevant data. So that’s what I want to test here.
For reference, the traces table in ClickHouse looks like this:
CREATE TABLE default.otel_traces
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`ParentSpanId` String CODEC(ZSTD(1)),
`TraceState` String CODEC(ZSTD(1)),
`SpanName` LowCardinality(String) CODEC(ZSTD(1)),
`SpanKind` LowCardinality(String) CODEC(ZSTD(1)),
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` String CODEC(ZSTD(1)),
`SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`Duration` Int64 CODEC(ZSTD(1)),
`StatusCode` LowCardinality(String) CODEC(ZSTD(1)),
`StatusMessage` String CODEC(ZSTD(1)),
`Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)),
`Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)),
`Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
`Links.TraceId` Array(String) CODEC(ZSTD(1)),
`Links.SpanId` Array(String) CODEC(ZSTD(1)),
`Links.TraceState` Array(String) CODEC(ZSTD(1)),
`Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_duration Duration TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
TTL toDateTime(Timestamp) + toIntervalDay(3)
SETTINGS ttl_only_drop_parts = 1
My application under test exposes an endpoint for ingesting data. As mentioned, it does not make use of the ClickHouse client, instead just emits tracing data that ends up in the DB.
async fn upload_data(Json(payload): Json<Data>) -> StatusCode {
let span = span!(
tracing::Level::INFO,
"json-benchmark",
timestamp = payload.timestamp,
bucket = payload.bucket,
payload = serde_json::to_string(&payload.data).unwrap_or_default()
);
let _enter = span.enter();
StatusCode::OK
}
To see how many queries per second can be served I expose endpoints that use the ClickHouse client to run the queries. For instance, for the GPS use case I have this endpoint:
async fn get_gps_coords(State(conn): State<Client>) -> (StatusCode, Json<Vec<GPSResponse>>) {
let mut cursor = conn
.query(
"SELECT
JSONExtractString(SpanAttributes['payload'], 'longitude') AS longitude,
JSONExtractString(SpanAttributes['payload'], 'latitude') AS latitude
FROM otel_traces
WHERE SpanAttributes['bucket'] = 'location'
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'longitude')) > 6
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'longitude')) < 10
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'latitude')) > 45
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'latitude')) < 50",
)
.fetch::<GPSResponse>()
.unwrap();
let mut results = Vec::new();
while let Some(row) = cursor.next().await.unwrap() {
results.push(row);
}
(StatusCode::OK, Json(results))
}
These endpoints are then load tested via k6 to get the numbers below.
Queries
For reference, here are the three queries for the different use cases. These should be ClickHouse equivalents to the DuckDB ones in the previous note.
GPS coordinates:
SELECT
JSONExtractString(SpanAttributes['payload'], 'longitude') AS longitude,
JONExtractString(SpanAttributes['payload'], 'latitude') AS latitude
FROM otel_traces
WHERE SpanAttributes['bucket'] = 'location'
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'longitude')) > 6
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'longitude')) < 10
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'latitude')) > 45
AND toFloat64(JSONExtractString(SpanAttributes['payload'], 'latitude')) < 50
This checks how many location entries are within a certain longitude and latitude rectangle.
CO2:
SELECT
toMonth(parseDateTimeBestEffort(SpanAttributes['timestamp'])) AS timestamp,
avg(toFloat64(JSONExtractString(SpanAttributes['payload'], 'co2'))) AS avg
FROM otel_traces
WHERE SpanAttributes['bucket'] = 'co2'
GROUP BY toMonth(parseDateTimeBestEffort(SpanAttributes['timestamp']))
Possibly the most intensive query: This aggregates CO2 measurements by month and computes the respective averages.
Structured logs:
SELECT
count(*) AS count,
JSONExtractString(SpanAttributes['payload'], 'endpoint') AS endpoint
FROM otel_traces
WHERE SpanAttributes['bucket'] = 'logs'
AND JSONExtractString(SpanAttributes['payload'], 'level') = 'error'
AND parseDateTimeBestEffort(SpanAttributes['timestamp']) > toDate('2023-07-01') - INTERVAL 90 DAY
GROUP BY JSONExtractString(SpanAttributes['payload'], 'endpoint')
ORDER BY count(*) DESC
This groups the entries indicating an error by endpoint
and checks which one has the
highest error count. It does so over the data of the last 90 days. (Here I could not use today
minus
90 days as the data was older at this point and I was too lazy to re-ingest all the data with up-to-date timestamps,
so I just chose a date somewhere within the date range)
Results
ClickHouse just blows the other databases out of the water. We are comparing apples to oranges here as both SQLite and DuckDB are embedded DBs while ClickHouse follows a client-server model. But either way, the gains are undeniable.
And this is with a rather unfavorable data model. I’m really misusing the tracing table
here but ClickHouse is just extremely fast at extracting nested JSON attributes. This is
undoubtedly thanks to the Map
data type.
The steady performance across all use cases even indicates to me that I’m maybe hitting some other bottleneck. But as it’s plenty good enough I’ll stop here.