Add InfluxDB 3 Core entry#864
Conversation
Adds an entry for the open-source SQL build of InfluxDB. The query engine is
Apache DataFusion; ingestion is line protocol over /api/v3/write_lp because
there is no native CSV/Parquet bulk loader. load.py streams hits.tsv, encodes
each row as a line-protocol point with a unique row-index timestamp, and POSTs
in 1000-row batches. Field names are lowercased so the standard CamelCase
ClickBench queries resolve under DataFusion's identifier folding. Q19 and Q43
cast EventTime (stored as a string field) to TIMESTAMP for extract(minute) and
date_trunc('minute', ...). Removes InfluxDB from the README TODO list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Data loading is painfully slow. |
Two issues with the original entry:
1. Loading was very slow. The single-threaded loader with 1000-row
batches managed ~1 K rows/s on a c7a.metal-class machine — about
28 hours for the full 100M-row dataset.
2. The captured benchmark log was dominated by noisy output during
the load: progress every 5 s for hours, plus apt/wget/pip
verbose output, risked hitting the 1 MiB log cap.
Changes:
- load.py: send batches through a ThreadPoolExecutor (16 workers,
bounded queue). InfluxDB 3 saturates around 16 concurrent writers
on this hardware; doubling to 32 only adds ~10%. Progress prints
every 30 s instead of every 5 s. Bumped BATCH_ROWS from 1000 to
2000 to halve the HTTP round-trip count without hitting the
default 10 MiB request-size cap.
- benchmark.sh: silence apt and wget; install python3-requests via
apt instead of `pip3 install --break-system-packages` (which is
refused under PEP 668 anyway on noble); bump
--max-http-request-size to 64 MiB and --wal-max-write-buffer-size
to 1M as headroom. (We also briefly tried
--wal-flush-interval 30s; it actively hurt throughput by ~30× —
each write blocked until the next flush — so default 1s wins.)
Measured on a 1M-row sample, single-node c7a.metal-48xl:
single-thread, BATCH=1000: ~1,000 rows/s
16 workers, BATCH=2000: ~30,000 rows/s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nce curl errors
The previous run produced log lines like:
curl: (18) transfer closed with outstanding read data remaining
null, ...
Three things were happening:
1. DataFusion was OOMing on heavy queries (Q29's REGEXP_REPLACE was the
first to fall over) and aborting the response mid-stream. The log
says: "Memory Exhausted while SpillPool (DiskManager is disabled)".
InfluxDB 3.9.2 ships with DataFusion's DiskManager hardcoded
disabled — there is no spill-to-disk fallback, and no flag to
enable it. The only mitigation we have is to make the in-memory
budget as large as possible.
2. After the load, the server still has WAL state and large in-memory
write buffers. That memory isn't available to queries until the
next snapshot, which made even moderate queries fail with the same
OOM error.
3. `curl -sS` printed each transfer-closed message to stderr, which
`tee` captured into the log alongside the `null` row from run.sh.
Changes:
- benchmark.sh: pass `--exec-mem-pool-bytes 80%` so DataFusion gets
the lion's share of the box for query execution, and restart the
server between load and queries (drains the WAL into Parquet,
releases write-buffer memory, gives queries a clean budget).
- run.sh: drop the `-S` from `curl -sS` and add `2>/dev/null`. curl's
exit code is enough for run.sh to record `null`; the human-readable
error message just polluted the captured log.
After these changes a 5M-row sample produces 986 bytes of run.sh log
with one clean `[null, null, null]` row for Q29 (still OOMs even at
5M with a regex over Referer; nothing we can do about that without a
DiskManager).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The problem: InfluxDB does not work at all.
|
Summary
influxdb/benchmark entry targeting InfluxDB 3 Core — the open-source, SQL-capable (DataFusion) build of InfluxDB.load.pystreamshits.tsvto/api/v3/write_lp. All 105 columns are stored as fields (no tags), with a unique row-index nanosecond timestamp so points don't merge. Field names are lowercased so standard CamelCase ClickBench queries resolve under DataFusion's identifier-case folding.queries.sqlis the standard ClickBench set; only Q19 and Q43 are adapted (CAST(EventTime AS TIMESTAMP)) sinceEventTimeis stored as a string field.TODOlist.Validation
hits.tsv. All 43 queries returned three timings each — no nulls, no errors. Spot-checked Q19 (extract(minute ...)), Q29 (REGEXP_REPLACE), and Q43 (DATE_TRUNC('minute', ...)) — all returned sensible rows.Test plan
benchmark.shon ac6a.4xlargeVM and captureLoad time/Data size/ 43 query timings.influxdb/results/c6a.4xlarge.json.🤖 Generated with Claude Code