feat(server): stream monolithic blob PUT straight to object store#7
Merged
Conversation
3bd5da9 to
e37e178
Compare
4f23c9d to
740605c
Compare
persistMonolithicUpload spooled the entire blob to a disk tempfile (via the chunked-upload session machinery), hashed it, then read it back and uploaded to the object store — two full passes plus ~2x disk I/O, and receive/upload run serially. For multi-GiB VM disk/memory blobs that's the bulk of a push (single PUTs were taking 2-4 min). Monolithic PUT already knows the digest up front (it's in the URL), so there's no need to buffer: stream the request body through a sha256 hasher straight into a concurrent multipart upload (no disk), verify the digest once the stream drains, and delete on mismatch so the content-addressed key never keeps unverified bytes. Server-side digest verification is preserved; no client change. The chunked PATCH path (no first-party client uses it) still spools to disk, since its digest is only known at finalize. GCS S3-compat streaming multipart validated against staging: 12 MiB in 3 concurrent 5 MiB parts, sha256 round-trip verified.
740605c to
dd18dc6
Compare
Streaming writes direct to the digest key (kept for speed), then verifies. - fail closed when BlobExists errors: streaming on could overwrite a valid blob with unverified bytes and the mismatch path would then delete it - delete a mismatched blob on a detached, bounded context so cancellation can't suppress it (leaving bytes the dedup trusts); log for a sweep - 500 store failures return INTERNAL_ERROR, not BLOB_UPLOAD_INVALID
Each monolithic streaming upload holds ~256 MiB (64 MiB x 4 parts), so unbounded concurrency OOMs the server. Gate them through a buffered-channel semaphore sized by EPOCH_MAX_STREAMING_UPLOADS (default 8); excess uploads block on TCP backpressure instead of allocating. The dedup short-circuit and the disk-spooled chunked path stay ungated (they don't carry the cost).
dd18dc6 to
1a3e9e0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
persistMonolithicUpload(thePUT /v2/<name>/blobs/<digest>path — the only push path our clients use) spooled the entire blob to a disk tempfile via the chunked-upload session machinery, hashed it, then read it back and uploaded to the object store. So a push was:Two full passes, ~2x disk I/O, and receive and upload run serially. For multi-GiB VM disk/memory blobs this was the bulk of push time (single PUTs were taking 2–4 min).
Change
A monolithic PUT already knows the digest up front (it's in the URL), so there's no reason to buffer. Stream the request body through a sha256 hasher straight into a concurrent multipart upload (
minioConcurrentStreamParts, bounded atPartSize*NumThreads= 64 MiB × 4), verify the digest once the stream drains, delete on mismatch.DIGEST_INVALID.Measured live (deployed to cocoonstack-us, image
redirect-stream-20260618)Server-side PUT durations (pure upload-through-epoch — most accurate):
win10-20260618-2(~10.4 GB, 5 layers) — big layer A4.0K) sampled 3× during a multi-GB PUT → bytes stream straight through, nothing buffered whole. This is the core fix.cocoon snapshot export+ per-blob sha256 buffering, not just the network upload.)Remaining ceiling: bytes still transit epoch (TLS in + multipart out on ~3 CPU), so big layers cap ~80–110 MB/s. GCS-direct push (presigned PUT) is a possible follow-up.
Relation to #6
Independent of #6 (blob GET redirect). Together: pull bypasses the proxy entirely (#6), push stops double-buffering and parallelizes the GCS leg (this). Fully removing epoch from the push data path (presigned PUT direct to GCS, first-party-trust + GCS-CRC32C) is a possible follow-up.