Add Quickwit entry by alexey-milovidov · Pull Request #865 · ClickHouse/ClickBench

alexey-milovidov · 2026-05-07T20:04:19Z

Summary

Adds a benchmark entry for Quickwit, a Rust-based search engine for log analytics built on Tantivy. Modeled on the existing Elasticsearch entry.

Quickwit exposes an Elasticsearch-compatible REST API but no SQL endpoint, so each ClickBench query is hand-translated to ES DSL in `queries.json`. Loading goes through `/api/v1/_elastic/hits/_bulk`; querying through `/_search`.

19 of the 43 queries are not expressible in Quickwit's ES API and are recorded as `null`:

Reason	Queries
`COUNT(DISTINCT)` — no `cardinality` aggregation	5, 6, 9, 10, 11, 12, 14, 23
Substring `LIKE '%…%'` — leading wildcards rejected, no `wildcard`/`regexp` query	21, 22, 24
`ORDER BY` on text field — sort by text not supported	26, 27
Scripted/runtime fields, `REGEXP_REPLACE`, `CASE WHEN`, integer arithmetic in aggregations	19, 28, 29, 30, 36, 40

The remaining 24 queries were validated against a 1M-row sample on a single c6a-class node (aarch64). The full 100M-row run was not executed in this PR; `results/` is empty pending a full benchmark run.

Test plan

`benchmark.sh` installs Quickwit 0.8.2 and starts the server (verified, aarch64 + x86_64 via `uname -m`)
Index creation from `index_config.yaml` succeeds (105 fields)
`load.py` ingests via the ES bulk API (~17K docs/s for the 1M sample)
All 43 entries in `queries.json` parse as valid JSON; `run.sh` emits 24 timings + 19 `null`s
Full 100M-row load and `results/c6a.4xlarge.json` (follow-up)

Quickwit (Rust, Tantivy-based) exposes an Elasticsearch-compatible REST API but no SQL endpoint, so each ClickBench query is hand-translated to ES DSL in queries.json. Loading goes through /api/v1/_elastic/hits/_bulk; querying through /_search. 19 of the 43 queries are not expressible in Quickwit's ES API (COUNT(DISTINCT), substring LIKE, scripted/runtime fields, REGEXP_REPLACE, ORDER BY on text fields) and are recorded as null. The remaining 24 queries were validated against a 1M-row sample on a single node. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ubuntu 24.04 (the noble image used by run-benchmark.sh) refuses "pip3 install --user requests" under PEP 668's externally-managed environment, which aborted benchmark.sh after ~28s on c7a.metal-48xl. The python3-requests apt package is available and sufficient. Also drop the symlink "quickwit -> quickwit-v0.8.2" since the source directory is itself named "quickwit", and reference the versioned dir directly via $QW_DIR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-05-08T10:40:01Z

From @melvynator:

Quickwit supports cardinality aggregation. https://quickwit.io/docs/main-branch/reference/aggregation#cardinality

alexey-milovidov · 2026-05-08T11:30:39Z

Cardinality is in v0.9 (unreleased) — only the binary distribution is for v0.8.2.

alexey-milovidov · 2026-05-08T11:30:48Z

But we can use the unreleased version.

Stable Quickwit 0.8.2 has neither the `cardinality` aggregation nor a `wildcard` query, so 19 of the 43 ClickBench queries had to be reported as null. The v0.9 line (still unreleased; we use the `v0.9.0-rc` Docker image) adds both, which lets us express 11 more queries (Q5/6/9/10/11/ 12/14/21/22/23/24). 8 queries still depend on scripted/runtime fields or text-field sort, neither of which v0.9 provides. Loading switches from the Elasticsearch-compatible bulk endpoint to `quickwit tool local-ingest`, fed by `zcat hits.json.gz` over stdin. v0.9's sharded ingest-v2 API caps single-node throughput to a few MB/s and stalls waiting for shards to scale; `local-ingest` builds splits directly on the configured storage and the running server picks them up at the next metastore poll. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… timeout The cloud-init log uploaded after the run is constrained to <1 MiB. Two sources were chatty enough to risk hitting that limit on a 100M-row load: `local-ingest`'s per-second progress line and the apt/docker pull output. Throttle the former to one line per ~30 s with awk, and silence apt/docker-pull entirely. Also add node-config.yaml mounted on top of the image's default config to bump the searcher's per-request and per-leaf timeouts from 30 s to 600 s. Several high-cardinality nested aggregations (Q17/18/32/33) on the full dataset run longer than 30 s and were timing out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

melvynator · 2026-05-08T13:13:14Z

But we can use the unreleased version.

Well, v0.8.2 is 2 years old. And the project has not made major release since the Datadog acquisition, there is no point in comparing a project that is 2 years old, so the master build is the only viable and reasonable options.

quickwit-oss/tantivy#2337

ClickBench's run.sh convention drops the OS page cache before each query. For Quickwit that's not enough — its in-process caches (partial_request_cache, fast_field_cache, split_footer_cache, predicate_cache) survive `drop_caches`, and there's no cache-clear endpoint in the REST API. Without action, warm runs were consistently ~30× faster than cold runs because they were replaying memoized results. - Disable `partial_request_cache` in node-config.yaml. This is the per-split partial-result cache; keeping it on lets the engine short-circuit identical queries. - Leave `predicate_cache` at its default. It's a predicate-evaluation cache (analogous to ClickHouse's query condition cache), not a result cache. - Restart the Quickwit container in run.sh before each non-null query. This clears the remaining in-process caches (fast_field_cache, split_footer_cache, predicate_cache) so the first run is genuinely cold; the 2nd and 3rd runs benefit from caches re-warmed by run 1, matching ClickBench's cold/warm convention. Restart cycle is ~11s on this hardware, ~7 min total overhead across the 35 non-null queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-05-08T15:48:10Z

Switched to the nightly build.

alexey-milovidov and others added 2 commits May 7, 2026 20:03

alexey-milovidov and others added 2 commits May 8, 2026 11:50

alexey-milovidov and others added 2 commits May 8, 2026 13:19

Update node-config.yaml

f96317a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Quickwit entry#865

Add Quickwit entry#865
alexey-milovidov wants to merge 6 commits intomainfrom
add-quickwit-entry

alexey-milovidov commented May 7, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

melvynator commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexey-milovidov commented May 7, 2026

Summary

Test plan

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

melvynator commented May 8, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants