Skip to content

Stream CloudFetch result chunks to bound memory and prevent OOM on large results#1509

Open
msrathore-db wants to merge 3 commits into
mainfrom
fix/issue-1508-fetch-size
Open

Stream CloudFetch result chunks to bound memory and prevent OOM on large results#1509
msrathore-db wants to merge 3 commits into
mainfrom
fix/issue-1508-fetch-size

Conversation

@msrathore-db

@msrathore-db msrathore-db commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1508.

Downloading large query results via CloudFetch could exhaust the JVM heap and throw java.lang.OutOfMemoryError: Java heap space (surfaced as Failed to ready chunk / Download failed for chunk index 0), especially on smaller heaps.

Root cause

The OutOfMemoryError is on-heap. Two factors combined to make on-heap usage scale with the CloudFetch download concurrency (cloudFetchThreadPoolSize, default 16) rather than with available heap:

  1. Each chunk was decompressed by materializing the entire decompressed Arrow payload into an on-heap byte[] (several times the compressed size) before parsing.
  2. Chunk downloads were scheduled with no regard to size, so up to cloudFetchThreadPoolSize of those transient decompressed copies existed at once.

(The parsed Arrow vectors themselves are off-heap, so they were not the cause — the transient on-heap decompression buffers were.)

Fix

  1. Streamed decompression. Decompression is now streamed directly into the Arrow reader (DecompressionUtil.decompressToInputStream), so the decompressed payload is never materialized on-heap alongside the compressed bytes. Chunks are still downloaded, decompressed, and parsed in parallel ahead of consumption — throughput is unaffected.
  2. Bounded concurrent downloads. Chunks carry their byte size from the result manifest, and scheduling is gated on an in-memory byte budget (default: a fraction of the JVM max heap; overridable via the new cloudFetchMaxBytesInMemory connection property) in addition to the existing thread-pool limit. At least one chunk is always allowed so an oversized chunk cannot stall consumption; the budget is released as chunks are consumed.

Applies to both the SQL Execution (SEA) and Thrift result paths, and to RemoteChunkProvider and StreamingChunkProvider.

Testing

Verified against a SQL warehouse on a 26-column, 169,769-row table (results split into 8 CloudFetch chunks):

  • Memory: the unpatched driver OOMs at -Xmx128m; the patched driver streams the full result at -Xmx40m (peak ~38 MB) — roughly a 3x reduction in required heap.
  • No throughput regression: at -Xmx2g, the patched driver reads the same result in ~14-16 s vs ~22-28 s unpatched, with lower peak heap — streaming decompression removes the per-chunk full-copy allocation and its GC churn, and the byte budget does not throttle when heap is ample.
  • Correctness: a checksum over all 4.4M cells, ordered by a column, is byte-identical between the unpatched and patched driver across both the Thrift and SEA paths and across heap sizes from 64 MB to 1 GB — no truncation or corruption.
  • Unit tests: added coverage for the byte-budget gating (count vs. byte limit, always-allow-one, no-limit), per-chunk byte-size plumbing, streaming decompression, and the config default/override. The full jdbc-core unit suite passes; the only failing tests locally are pre-existing integration/e2e tests unrelated to this change (confirmed failing identically on main).

This pull request and its description were written by Isaac.

NO_CHANGELOG=true

Large query results downloaded via CloudFetch could exhaust the JVM heap
because up to cloudFetchThreadPoolSize chunks were downloaded and held in
memory concurrently, regardless of their size. On small heaps this caused
an OutOfMemoryError while readying the first chunk.

Track each chunk's byte size from the result manifest and gate chunk
scheduling on a configurable in-memory byte budget (default: a fraction of
the JVM max heap) in addition to the existing thread-pool limit. At least
one chunk is always allowed so an oversized chunk cannot stall consumption.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db changed the title Bound CloudFetch chunk downloads by an in-memory byte budget to prevent OOM Stream CloudFetch result chunks to bound memory and prevent OOM on large results Jun 25, 2026
@msrathore-db msrathore-db force-pushed the fix/issue-1508-fetch-size branch from 8e41b07 to 7c78368 Compare June 25, 2026 20:52
@msrathore-db msrathore-db reopened this Jun 25, 2026
…heap

Each chunk was decompressed by fully materializing the decompressed Arrow
payload into an on-heap byte[] (several times the compressed size) before
parsing. With multiple chunks downloading and decompressing in parallel,
these transient on-heap copies are what exhausted the Java heap on small
heaps.

Decompression is now streamed directly into the Arrow reader, so the
decompressed payload is never held on-heap alongside the compressed bytes
(the parsed vectors are off-heap). Chunks are still downloaded and parsed
in parallel ahead of consumption, so throughput is unaffected; combined
with the in-memory byte budget for concurrent downloads, peak heap for
large CloudFetch results drops substantially.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the fix/issue-1508-fetch-size branch from 7c78368 to f8063a4 Compare June 25, 2026 21:30
…size

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

# Conflicts:
#	NEXT_CHANGELOG.md
@msrathore-db msrathore-db force-pushed the fix/issue-1508-fetch-size branch from 8fca624 to 7c29f02 Compare June 25, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]In 3.x versions fetch size is ignored resulting in out-of-memory when downloading query results

1 participant