Add InfluxDB 3 Core entry by alexey-milovidov · Pull Request #864 · ClickHouse/ClickBench

alexey-milovidov · 2026-05-07T17:25:15Z

Summary

Adds an influxdb/ benchmark entry targeting InfluxDB 3 Core — the open-source, SQL-capable (DataFusion) build of InfluxDB.
load.py streams hits.tsv to /api/v3/write_lp. All 105 columns are stored as fields (no tags), with a unique row-index nanosecond timestamp so points don't merge. Field names are lowercased so standard CamelCase ClickBench queries resolve under DataFusion's identifier-case folding.
queries.sql is the standard ClickBench set; only Q19 and Q43 are adapted (CAST(EventTime AS TIMESTAMP)) since EventTime is stored as a string field.
Removes InfluxDB from the README TODO list.

Validation

Verified the install + start + create-db + load + query flow on a 1000-row sample of hits.tsv. All 43 queries returned three timings each — no nulls, no errors. Spot-checked Q19 (extract(minute ...)), Q29 (REGEXP_REPLACE), and Q43 (DATE_TRUNC('minute', ...)) — all returned sensible rows.
Full 100M-row load not run on this branch — that is best done on a benchmark VM. At ~1 KB per line-protocol point this will take some hours; results to be added after a real run.

Test plan

Run benchmark.sh on a c6a.4xlarge VM and capture Load time / Data size / 43 query timings.
Add influxdb/results/c6a.4xlarge.json.

🤖 Generated with Claude Code

Adds an entry for the open-source SQL build of InfluxDB. The query engine is Apache DataFusion; ingestion is line protocol over /api/v3/write_lp because there is no native CSV/Parquet bulk loader. load.py streams hits.tsv, encodes each row as a line-protocol point with a unique row-index timestamp, and POSTs in 1000-row batches. Field names are lowercased so the standard CamelCase ClickBench queries resolve under DataFusion's identifier folding. Q19 and Q43 cast EventTime (stored as a string field) to TIMESTAMP for extract(minute) and date_trunc('minute', ...). Removes InfluxDB from the README TODO list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-05-07T18:59:11Z

Data loading is painfully slow.

Two issues with the original entry: 1. Loading was very slow. The single-threaded loader with 1000-row batches managed ~1 K rows/s on a c7a.metal-class machine — about 28 hours for the full 100M-row dataset. 2. The captured benchmark log was dominated by noisy output during the load: progress every 5 s for hours, plus apt/wget/pip verbose output, risked hitting the 1 MiB log cap. Changes: - load.py: send batches through a ThreadPoolExecutor (16 workers, bounded queue). InfluxDB 3 saturates around 16 concurrent writers on this hardware; doubling to 32 only adds ~10%. Progress prints every 30 s instead of every 5 s. Bumped BATCH_ROWS from 1000 to 2000 to halve the HTTP round-trip count without hitting the default 10 MiB request-size cap. - benchmark.sh: silence apt and wget; install python3-requests via apt instead of `pip3 install --break-system-packages` (which is refused under PEP 668 anyway on noble); bump --max-http-request-size to 64 MiB and --wal-max-write-buffer-size to 1M as headroom. (We also briefly tried --wal-flush-interval 30s; it actively hurt throughput by ~30× — each write blocked until the next flush — so default 1s wins.) Measured on a 1M-row sample, single-node c7a.metal-48xl: single-thread, BATCH=1000: ~1,000 rows/s 16 workers, BATCH=2000: ~30,000 rows/s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nce curl errors The previous run produced log lines like: curl: (18) transfer closed with outstanding read data remaining null, ... Three things were happening: 1. DataFusion was OOMing on heavy queries (Q29's REGEXP_REPLACE was the first to fall over) and aborting the response mid-stream. The log says: "Memory Exhausted while SpillPool (DiskManager is disabled)". InfluxDB 3.9.2 ships with DataFusion's DiskManager hardcoded disabled — there is no spill-to-disk fallback, and no flag to enable it. The only mitigation we have is to make the in-memory budget as large as possible. 2. After the load, the server still has WAL state and large in-memory write buffers. That memory isn't available to queries until the next snapshot, which made even moderate queries fail with the same OOM error. 3. `curl -sS` printed each transfer-closed message to stderr, which `tee` captured into the log alongside the `null` row from run.sh. Changes: - benchmark.sh: pass `--exec-mem-pool-bytes 80%` so DataFusion gets the lion's share of the box for query execution, and restart the server between load and queries (drains the WAL into Parquet, releases write-buffer memory, gives queries a clean budget). - run.sh: drop the `-S` from `curl -sS` and add `2>/dev/null`. curl's exit code is enough for run.sh to record `null`; the human-readable error message just polluted the captured log. After these changes a 5M-row sample produces 986 bytes of run.sh log with one clean `[null, null, null]` row for Q29 (still OOMs even at 5M with a regex over Referer; nothing we can do about that without a DiskManager). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-05-08T20:20:13Z

The problem: InfluxDB does not work at all.

2026-05-08T19:31:51.501856Z  INFO influxdb3_query_executor: executing sql query database=hits query=SELECT COUNT(*) FROM hits; params=None
2026-05-08T19:32:39.121739Z ERROR influxdb3_server::http: Error while handling request error=query error: error while planning query: order_union_sorted_inputs
caused by
Internal error: should not perform a file scan of overlapping ranges within same file.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues method=POST path=/api/v3/query_sql content_length=Some("75") client_ip=127.0.0.1

Based on the research, this is a known assertion in InfluxDB 3's own physical optimizer (core/iox_query/src/physical_optimizer/sort/regroup_files.rs::group_same_file_sources) that fires when planning detects
overlapping byte/row ranges within one Parquet file. There is no public fix; v3.9.2 still ships the assertion. Our load.py deterministically triggers it: 16 parallel writers all submit batches whose timestamps
interleave through [TS_BASE, TS_BASE+N), so every WAL segment ends up containing the same broad time range, and on flush the resulting Parquet files have near-identical [min_time, max_time]. The sort-pushdown
pass then trips when it tries to regroup their scan ranges.
> A reload is unavoidable — the Parquet layout on disk is what's broken, and queries will keep failing until the data is rewritten in a layout the planner can handle.

The fixes worth considering:

Disjoint time-range chunks (recommended) — split the 100M rows into N contiguous chunks. For each chunk, load all batches in parallel, then kill -TERM the server (drain WAL → Parquet) before starting the next
chunk. Each Parquet file now covers only one chunk's time range, so files don't overlap. Keeps throughput, predictable layout.

Single-writer load — set WORKERS = 1. Simple, but ~30× slower according to the existing comment in load.py.

Try a different InfluxDB version — research found no version known to be immune, so this is a gamble.

alexey-milovidov and others added 2 commits May 8, 2026 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InfluxDB 3 Core entry#864

Add InfluxDB 3 Core entry#864
alexey-milovidov wants to merge 3 commits intomainfrom
add-influxdb-entry

alexey-milovidov commented May 7, 2026

Uh oh!

alexey-milovidov commented May 7, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexey-milovidov commented May 7, 2026

Summary

Validation

Test plan

Uh oh!

alexey-milovidov commented May 7, 2026

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant