Fix memory safety issues#3188
Open
texodus wants to merge 1 commit into
Open
Conversation
c1774b3 to
95e8e88
Compare
Signed-off-by: Andrew Stein <steinlink@gmail.com>
95e8e88 to
aa54fef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I asked Claude to find memory safety bugs in Perspective and write a PR, while I played Balatro on my phone. Here's its own summary of what it found:
Fixes
Uneven column lengths in columnar table create/update (
table.cpp—Table::from_cols,Table::update_cols): the table was sized from a singlecolumn's length while every column was filled to its own length. Now the
table is sized to the longest column and shorter columns are null-padded, so
all writes stay in bounds.
NDJSON row under-allocation (
table.cpp—Table::from_ndjson): capacitywas estimated from the newline count but one row was written per parsed
object. The table is now grown per row (amortized O(1) via geometric capacity
growth), so concatenated objects without newline separators can no longer
overrun the buffer.
Arrow row-count truncation (
arrow_loader.cpp—ArrowLoader::row_count):Arrow's 64-bit row count was silently truncated to 32 bits, under-sizing the
destination table relative to the data written during fill. Oversized/negative
counts are now rejected instead of truncated.
Arrow
time32element width (arrow_loader.cpp—copy_array):time32values are 32-bit and map to a 4-byte column, but the loader copied 8 bytes
per element, over-reading the source and over-writing the destination. Now
copies 4 bytes per element.
first/lastaggregate with a missing sort dependency (sparse_tree.cpp—
first_last_helper): the helper assumed the aggregate spec always carriedboth a value and a sort dependency and indexed the dependency list
unconditionally. A view whose sort column falls outside the visible set could
produce a spec without the sort dependency, causing an out-of-bounds read. Now
guarded (covers
first,last, andlast − first).Residency eviction data race (
residency.cpp,residency.h): theshared pending-eviction vectors were cleared outside the manager's mutex, so
concurrent request-thread eviction passes could double-free. All mutations now
occur under the lock, and a dedicated mutex serializes each eviction cycle.
Unvalidated Arrow input (
arrow_loader.cpp—ArrowLoader::initialize):Arrow's IPC reader does not validate buffer contents, yet the fill paths index
value/offset/dictionary buffers directly. The loaded (remotely supplied) table
is now fully validated (
ValidateFull()) before those buffers are trusted, soa malformed payload — bad offsets, out-of-range dictionary indices, inconsistent
chunk lengths — is rejected instead of read out of bounds. This is the systemic
defense behind the
time32and row-count fixes above. Note: validation isO(data) per ingested payload;
Validate()plus targeted bounds checks would bea cheaper alternative if ingest throughput matters.
Out-of-bounds access in expression vector functions (
computed_function.cpp—
diff3,norm3,cross_product3,dot_product3): these operate on3-element vectors and indexed
v[0..2]unconditionally, but their exprtkparameter sequences (
"VVV"/"VV"/"V") enforce vector type, not length.A user expression can declare shorter vectors (e.g.
var v[2] := {1,2}; norm3(v)), causing an out-of-bounds read — and fordiff3/cross_product3,an out-of-bounds write to the (short) output vector. Each function now
clears its result for vectors shorter than 3 before indexing; this surfaces
as an invalid expression (rejected at view creation) rather than an
out-of-bounds access. Vectors of length 3 are unaffected.