Skip to content

Fix empty object write for small files in fsspec transactions#721

Merged
laughingman7743 merged 1 commit into
masterfrom
fix/s3-transaction-small-file-write
Jun 6, 2026
Merged

Fix empty object write for small files in fsspec transactions#721
laughingman7743 merged 1 commit into
masterfrom
fix/s3-transaction-small-file-write

Conversation

@laughingman7743

@laughingman7743 laughingman7743 commented Jun 6, 2026

Copy link
Copy Markdown
Member

WHAT

Fixes #719 — small files (smaller than the block size, default 5 MiB) written inside an fsspec transaction (autocommit=False) were committed to S3 as empty objects.

The fix in pyathena/filesystem/s3.py is a one-liner (twice): S3File._upload_chunk now returns not final instead of True. commit() and __init__ are unchanged.

WHY

S3File.close() triggers fsspec's flush(force=True), and fsspec resets self.buffer after the upload only when _upload_chunk returns a value other than False:

# fsspec/spec.py
if self._upload_chunk(final=force) is not False:
    self.offset += self.buffer.seek(0, 2)
    self.buffer = io.BytesIO()   # buffer reset

_upload_chunk previously returned True unconditionally. For a small file the actual upload is deferred to commit():

  • autocommit (normal open()): commit() runs synchronously inside _upload_chunk, before the reset — works.
  • transaction (autocommit=False): commit() is deferred to Transaction.complete(), which runs after fsspec reset the buffer, so it reads an empty buffer and sends body=b"" to PutObject.

Returning not final fixes this:

  • Mid-stream chunks (final=False) → True: fsspec clears the already-uploaded buffer between multipart parts (PyAthena delegates mid-stream buffer management to fsspec).
  • Final flush (final=True) → False: fsspec leaves the buffer intact, so the deferred commit() reads the real bytes.

This mirrors how s3fs solves the same problem — s3fs returns False from _upload_chunk to keep the buffer, and commit() reads from it. (s3fs returns False unconditionally because it does its own mid-stream buffer bookkeeping; PyAthena uses not final because it relies on fsspec for that.) The large-file/multipart path's commit() completes the upload from the already-uploaded parts and never reads the buffer.

AioS3File inherits _upload_chunk/commit from S3File, so the async filesystem benefits from the same fix.

This keeps PyAthena's custom S3FileSystem (which intentionally avoids the s3fs/aiobotocore dependency) and supports fsspec transactions properly rather than handing filesystem ownership back to s3fs.

Tests

Unit (no AWS) — tests/pyathena/filesystem/test_s3.py::TestS3File

Drive S3File._upload_chunk/commit/discard with a mocked filesystem (no AWS), asserting the fsspec return-value contract. Parametrized to cover the full matrix of autocommit (transaction vs not) × single vs multipart × commit vs rollback:

Test autocommit / transaction upload path
test_upload_chunk_small_file[True] autocommit (no txn) single one-shot PutObject, committed in-line
test_upload_chunk_small_file[False] transaction single GH-719 regression — deferred commit reads the preserved buffer
test_upload_chunk_multipart[True] autocommit (no txn) multipart CompleteMultipartUpload in-line
test_upload_chunk_multipart[False] transaction multipart CompleteMultipartUpload deferred to commit
test_discard[False] transaction (rollback) single no-op (object never created)
test_discard[True] transaction (rollback) multipart AbortMultipartUpload
test_upload_chunk_empty_file_touches autocommit empty touch (never PutObject)

The multipart tests also assert the mid-stream invariant (non-final → True so fsspec resets the buffer between parts; final → False), which guards against a future "simplification" to a plain return False that would corrupt multipart uploads.

Integration (real AWS) — sync (test_s3.py) and async (test_s3_async.py)

test_write_transaction / test_write_transaction_rollback, parametrized over a small (one-shot PutObject) and a 10 MiB (multipart) size:

  • write inside with fs.transaction: round-trips with the real content;
  • raising inside the transaction leaves no object (CompleteMultipartUpload vs AbortMultipartUpload paths exercised end-to-end, incl. the async executor).

AWS cost. The large (10 MiB) cases are bounded to three uploads total (sync write, sync rollback, async write; the async rollback stays small since AioS3File inherits discard()). These live at the filesystem layer, so they run once each and are not multiplied across cursor types (pandas/arrow/polars). No multi-GiB cases are added.

Verification

  • make lint — passes (ruff + format check + mypy).
  • 21 unit + 7 integration transaction cases pass against real AWS.
  • Confirmed the [False] unit cases fail when the source fix is reverted (regression guard), and the multipart mid-stream assertion fails under a plain return False (anti-simplification guard).

🤖 Generated with Claude Code

@laughingman7743 laughingman7743 force-pushed the fix/s3-transaction-small-file-write branch 4 times, most recently from b2f8920 to 58ef007 Compare June 6, 2026 07:33
Small files (< block size) written inside an fsspec transaction
(autocommit=False) were committed as empty objects.

S3File._upload_chunk returned True unconditionally, so fsspec's
flush() always reset self.buffer after the final flush. For a small
file the upload is deferred to commit(); in the transaction case
commit() runs after the reset and reads an empty buffer, sending
body=b"" to PutObject.

fsspec only resets the buffer when _upload_chunk returns a value other
than False (spec.py: `if self._upload_chunk(...) is not False:`).
Return `not final` instead: mid-stream chunks (final=False) still
return True so fsspec clears the already-uploaded buffer between parts,
but the final flush returns False, leaving the buffer intact for the
deferred commit() to read. This mirrors s3fs, which returns False to
keep the buffer; commit() is unchanged and reads the buffer as before.

AioS3File inherits _upload_chunk/commit from S3File, so the async
filesystem benefits from the same fix.

Tests:
- Unit (no AWS): _upload_chunk/commit return-value contract with a
  mocked _put_object, covering the small-file transaction regression,
  autocommit, empty-file touch, small-file discard (rollback), and the
  multipart mid-stream `not final` invariant (non-final -> True so
  fsspec resets the buffer between parts; final -> False).
- Integration (sync and async): write inside a transaction round-trips,
  and a raise inside the transaction leaves no object. Parametrized over
  a small (one-shot PutObject) and a 10 MiB (multipart) size so both the
  CompleteMultipartUpload and the discard()/AbortMultipartUpload paths
  are exercised. Large cases are kept to three uploads total and live at
  the filesystem layer (not multiplied across cursor types) to bound AWS
  cost.

Fixes #719

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@laughingman7743 laughingman7743 force-pushed the fix/s3-transaction-small-file-write branch from 58ef007 to a6e6b52 Compare June 6, 2026 07:48
@laughingman7743 laughingman7743 marked this pull request as ready for review June 6, 2026 07:51
@laughingman7743 laughingman7743 merged commit 88a4576 into master Jun 6, 2026
15 checks passed
@laughingman7743 laughingman7743 deleted the fix/s3-transaction-small-file-write branch June 6, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pyathena s3 fsspec replacement file system not writing when used with transactions & a context manager

1 participant