[io] Write byte counts in streamer field#22430
Conversation
The version 0 and version 1 streamer fields are exactly the same for small objects (<=1GiB). For larger objects, the version 1 streamer field stores the byte count stack before the data. This is up to a future patch. For the moment, we assert that the streamed buffer is never >1GiB.
Test Results 22 files 22 suites 3d 10h 28m 44s ⏱️ For more details on these failures, see this check. Results for commit 3b67b13. ♻️ This comment has been updated with latest results. |
Add version 1 streamer field.
Uses RStreamerField as a test bed for (de-)serializing large objects with TBufferFile.
fd54941 to
3b67b13
Compare
| if (static_cast<std::size_t>(nbytes) > kMaxSmallBuffer) { | ||
| throw RException(R__FAIL("large objects (>1GiB) not supported by the version 0 streamer field")); | ||
| } else { | ||
| assert(buffer.GetByteCounts().empty()); |
There was a problem hiding this comment.
For back-/forwards-porting this assert will have to be removed...
| Both versions have an identical on-disk representation when the streamed object is smaller than 1GiB. | ||
| Only version 1 supports larger streamed objects. | ||
| For large objects, the version 1 streamer field prepends the large byte counts ("byte count stack") to the byte stream. | ||
| The format for the version 1 byte stream is |
There was a problem hiding this comment.
For objects bigger than 1 GiB, the format [...] is:
| The format for the version 1 byte stream is | ||
|
|
||
| - 64bit unsigned integer: number of elements in the large byte count list | ||
| - List if 64bit unsigned integer pairs with the byte count location and byte count value` |
There was a problem hiding this comment.
"List of" plus stray backtick at the end of the line
| const std::size_t szBufCounts = sizeof(std::uint64_t) * (2 * nCounts + 1); | ||
| if (szBufCounts > nbytes) | ||
| throw RException(R__FAIL("invalid byte count size in streamer field: " + std::to_string(nCounts))); | ||
| bufCounts.resize(szBufCounts); |
There was a problem hiding this comment.
This will zero-initialize the elements. Maybe use Internal::MakeUninitArray as for writing?
| std::vector<unsigned char> bufCounts(sizeof(std::uint64_t)); | ||
| fAuxiliaryColumn->ReadV(collectionStart, sizeof(std::uint64_t), bufCounts.data()); | ||
| std::uint64_t nCounts; | ||
| std::size_t pos = Internal::RNTupleSerializer::DeserializeUInt64(bufCounts.data(), nCounts); |
There was a problem hiding this comment.
If using Internal::MakeUninitArray below, this could be a static array on the stack
|
|
||
| namespace { | ||
|
|
||
| class FileRaii { |
There was a problem hiding this comment.
We have this in tree/ntuple/test/ntuple_test.hxx, no?
| using ByteCountLocator_t = std::size_t; // This might become a pair<chunk_number, local_offset> if we implement chunked keys | ||
| using ByteCount_t = std::uint64_t; ///< Type used to store byte count values, can be changed to uint32_t if we implement chunked keys |
There was a problem hiding this comment.
| using ByteCountLocator_t = std::size_t; // This might become a pair<chunk_number, local_offset> if we implement chunked keys | |
| using ByteCount_t = std::uint64_t; ///< Type used to store byte count values, can be changed to uint32_t if we implement chunked keys | |
| using ByteCountLocator_t = std::size_t; ///< This might become a pair<chunk_number, local_offset> if we implement chunked keys | |
| using ByteCount_t = std::uint64_t; ///< Type used to store byte count values, can be changed to uint32_t if we implement chunked keys |
| throw RException(R__FAIL("invalid byte count size in streamer field: " + std::to_string(nCounts))); | ||
| const std::size_t szBufCounts = sizeof(std::uint64_t) * (2 * nCounts + 1); | ||
| if (szBufCounts > nbytes) | ||
| throw RException(R__FAIL("invalid byte count size in streamer field: " + std::to_string(nCounts))); |
There was a problem hiding this comment.
Should this mention szBufCounts (and nbytes) instead of nCounts?
| ClassDefNV(MemberWithCustomStreamer, 2); | ||
| }; | ||
|
|
||
| struct VersionedStreamerField |
There was a problem hiding this comment.
Out of curiosity, are the test related to the VersionedStreamerField linked to the large object streaming test or 'just' nice unrelated additions?
| EXPECT_EQ(-1, ptr->fOne.at(1000)); | ||
| EXPECT_EQ(100000000u, ptr->fTwo.size()); | ||
| EXPECT_EQ(-2, ptr->fTwo.at(2000)); |
There was a problem hiding this comment.
We may want to also test an element towards the end for the vector.
Adds support for streamer field v1, which writes the byte counts of large objects into the stream.
The first two commits should be backported.
Replaces #20608