Skip to content

Add InfluxDB 3 Core entry#864

Open
alexey-milovidov wants to merge 3 commits intomainfrom
add-influxdb-entry
Open

Add InfluxDB 3 Core entry#864
alexey-milovidov wants to merge 3 commits intomainfrom
add-influxdb-entry

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Summary

  • Adds an influxdb/ benchmark entry targeting InfluxDB 3 Core — the open-source, SQL-capable (DataFusion) build of InfluxDB.
  • load.py streams hits.tsv to /api/v3/write_lp. All 105 columns are stored as fields (no tags), with a unique row-index nanosecond timestamp so points don't merge. Field names are lowercased so standard CamelCase ClickBench queries resolve under DataFusion's identifier-case folding.
  • queries.sql is the standard ClickBench set; only Q19 and Q43 are adapted (CAST(EventTime AS TIMESTAMP)) since EventTime is stored as a string field.
  • Removes InfluxDB from the README TODO list.

Validation

  • Verified the install + start + create-db + load + query flow on a 1000-row sample of hits.tsv. All 43 queries returned three timings each — no nulls, no errors. Spot-checked Q19 (extract(minute ...)), Q29 (REGEXP_REPLACE), and Q43 (DATE_TRUNC('minute', ...)) — all returned sensible rows.
  • Full 100M-row load not run on this branch — that is best done on a benchmark VM. At ~1 KB per line-protocol point this will take some hours; results to be added after a real run.

Test plan

  • Run benchmark.sh on a c6a.4xlarge VM and capture Load time / Data size / 43 query timings.
  • Add influxdb/results/c6a.4xlarge.json.

🤖 Generated with Claude Code

Adds an entry for the open-source SQL build of InfluxDB. The query engine is
Apache DataFusion; ingestion is line protocol over /api/v3/write_lp because
there is no native CSV/Parquet bulk loader. load.py streams hits.tsv, encodes
each row as a line-protocol point with a unique row-index timestamp, and POSTs
in 1000-row batches. Field names are lowercased so the standard CamelCase
ClickBench queries resolve under DataFusion's identifier folding. Q19 and Q43
cast EventTime (stored as a string field) to TIMESTAMP for extract(minute) and
date_trunc('minute', ...). Removes InfluxDB from the README TODO list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Data loading is painfully slow.

alexey-milovidov and others added 2 commits May 8, 2026 14:55
Two issues with the original entry:

1. Loading was very slow. The single-threaded loader with 1000-row
   batches managed ~1 K rows/s on a c7a.metal-class machine — about
   28 hours for the full 100M-row dataset.

2. The captured benchmark log was dominated by noisy output during
   the load: progress every 5 s for hours, plus apt/wget/pip
   verbose output, risked hitting the 1 MiB log cap.

Changes:

- load.py: send batches through a ThreadPoolExecutor (16 workers,
  bounded queue). InfluxDB 3 saturates around 16 concurrent writers
  on this hardware; doubling to 32 only adds ~10%. Progress prints
  every 30 s instead of every 5 s. Bumped BATCH_ROWS from 1000 to
  2000 to halve the HTTP round-trip count without hitting the
  default 10 MiB request-size cap.

- benchmark.sh: silence apt and wget; install python3-requests via
  apt instead of `pip3 install --break-system-packages` (which is
  refused under PEP 668 anyway on noble); bump
  --max-http-request-size to 64 MiB and --wal-max-write-buffer-size
  to 1M as headroom. (We also briefly tried
  --wal-flush-interval 30s; it actively hurt throughput by ~30× —
  each write blocked until the next flush — so default 1s wins.)

Measured on a 1M-row sample, single-node c7a.metal-48xl:

    single-thread, BATCH=1000:  ~1,000 rows/s
    16 workers,    BATCH=2000:  ~30,000 rows/s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nce curl errors

The previous run produced log lines like:

    curl: (18) transfer closed with outstanding read data remaining
    null, ...

Three things were happening:

1. DataFusion was OOMing on heavy queries (Q29's REGEXP_REPLACE was the
   first to fall over) and aborting the response mid-stream. The log
   says: "Memory Exhausted while SpillPool (DiskManager is disabled)".
   InfluxDB 3.9.2 ships with DataFusion's DiskManager hardcoded
   disabled — there is no spill-to-disk fallback, and no flag to
   enable it. The only mitigation we have is to make the in-memory
   budget as large as possible.

2. After the load, the server still has WAL state and large in-memory
   write buffers. That memory isn't available to queries until the
   next snapshot, which made even moderate queries fail with the same
   OOM error.

3. `curl -sS` printed each transfer-closed message to stderr, which
   `tee` captured into the log alongside the `null` row from run.sh.

Changes:

- benchmark.sh: pass `--exec-mem-pool-bytes 80%` so DataFusion gets
  the lion's share of the box for query execution, and restart the
  server between load and queries (drains the WAL into Parquet,
  releases write-buffer memory, gives queries a clean budget).

- run.sh: drop the `-S` from `curl -sS` and add `2>/dev/null`. curl's
  exit code is enough for run.sh to record `null`; the human-readable
  error message just polluted the captured log.

After these changes a 5M-row sample produces 986 bytes of run.sh log
with one clean `[null, null, null]` row for Q29 (still OOMs even at
5M with a regex over Referer; nothing we can do about that without a
DiskManager).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

The problem: InfluxDB does not work at all.

2026-05-08T19:31:51.501856Z  INFO influxdb3_query_executor: executing sql query database=hits query=SELECT COUNT(*) FROM hits; params=None
2026-05-08T19:32:39.121739Z ERROR influxdb3_server::http: Error while handling request error=query error: error while planning query: order_union_sorted_inputs
caused by
Internal error: should not perform a file scan of overlapping ranges within same file.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues method=POST path=/api/v3/query_sql content_length=Some("75") client_ip=127.0.0.1

Based on the research, this is a known assertion in InfluxDB 3's own physical optimizer (core/iox_query/src/physical_optimizer/sort/regroup_files.rs::group_same_file_sources) that fires when planning detects
overlapping byte/row ranges within one Parquet file. There is no public fix; v3.9.2 still ships the assertion. Our load.py deterministically triggers it: 16 parallel writers all submit batches whose timestamps
interleave through [TS_BASE, TS_BASE+N), so every WAL segment ends up containing the same broad time range, and on flush the resulting Parquet files have near-identical [min_time, max_time]. The sort-pushdown
pass then trips when it tries to regroup their scan ranges.
> A reload is unavoidable — the Parquet layout on disk is what's broken, and queries will keep failing until the data is rewritten in a layout the planner can handle.

The fixes worth considering:

  1. Disjoint time-range chunks (recommended) — split the 100M rows into N contiguous chunks. For each chunk, load all batches in parallel, then kill -TERM the server (drain WAL → Parquet) before starting the next
    chunk. Each Parquet file now covers only one chunk's time range, so files don't overlap. Keeps throughput, predictable layout.
  2. Single-writer load — set WORKERS = 1. Simple, but ~30× slower according to the existing comment in load.py.
  3. Try a different InfluxDB version — research found no version known to be immune, so this is a gamble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant