perf(search): reuse IVF_RQ query scratch#6982
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
9788130 to
8476110
Compare
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review once |
There was a problem hiding this comment.
Substantial perf refactor — let me know if you'd like me to take a closer look at any particular area.
Extended reasoning...
Overview
This PR refactors the IVF_RQ flat search hot path to reuse per-query scratch buffers via a new QueryScratchPool, avoiding several per-query allocations (rotated query vectors, distance tables, residual arrays). It touches 7 files and ~700+ lines of code, including: new trait methods on VectorStore (dist_calculator_with_scratch) and IvfSubIndex (search_with_scratch, accumulate_topk_with_scratch), new types in storage.rs (QueryScratch, QueryScratchPool, QueryScratchCapacity, QueryScratchGuard, QueryResidual), residual-aware rotation paths in bq/storage.rs, and integration in IVFIndex (v2.rs). The benchmarks show modest but real wins (3–9% latency reductions on SIFT1M).
Security risks
No security-sensitive surface here — this is internal indexing code with no auth/crypto/input parsing changes. The main correctness risks are around memory safety: the PR introduces several new unsafe { Vec::set_len } calls in the distance-all path and a SIMD store_unaligned in copy_subtract_f32. The SAFETY comments are reasonable (the scalar fallback in sum_4bit_dist_table_scalar was explicitly updated to fill(0) before saturating_add so the overwrite contract holds on all paths), but these invariants are non-trivial and worth a second look.
Level of scrutiny
This warrants careful human review. It's a substantial refactor of performance-critical code, not a mechanical change. Specific things a human should weigh: (1) the new unsafe set_len invariants in RabitDistCalculator::distance_all_with_scratch and quantize_dist_table_into; (2) the lifetime/borrow design around Cow<'a, [f32]> in RabitDistCalculator and the f32_scratch: &'a mut Vec<f32> borrowed inside the calculator; (3) the new trait surface (default impls forwarding to the non-scratch variants look correct, but this is a public-ish trait API expansion); (4) the use_residual_scratch gating, which only enables the residual fast path when centroids are Float32 — the Float64/Float16 residual path is preserved via the typed Arrow subtract.
Other factors
Coverage on the patch is 92% with 62 uncovered lines (mostly in v2.rs glue and error paths in bq/storage.rs). The PR adds focused unit tests for the scratch pool, scratch reuse, residual centroid path, and a Float64 residual-before-cast precision test, plus a regression test for the scalar SIMD overwrite contract. No bugs were surfaced by the automated review. The author requested a review explicitly, and given the scope and the touch on hot-path memory safety, a maintainer should sign off rather than relying on a shadow approval.
Performance Improvement
IVF_RQ query currently allocates several temporary buffers on the hot path, including rotated query vectors, distance tables, and residual query arrays. These allocations happen repeatedly during partition search and show up in latency-sensitive vector search workloads.
This PR adds runtime query scratch reuse for the IVF_RQ flat search path:
QueryScratchPoolwhen anIVFIndexis loaded or reconstructed; it is not part of serialized/cachedIvfIndexState.num_compute_intensive_cpus()scratch entries from per-index dimensions and partition sizes.q - cinto scratch directly.QueryScratch.DeepSizeOf, including checked-out scratch slots.Benchmarks
GCP VM:
yang-agent-ae1f-ivfrq-rebase-20260528(c4-standard-16)Workload:
lance_ivfrq,sift1m,num_bits=1,target_partition_size=4096,k=10. IVF_PQ was not included.Baseline:
mainat5cf70b27b3ad38ecdcd1547b7af385e05f67598aCurrent PR head:
8476110dd4dd018bba05b8320ff999920a88288enprobes=8nprobes=24, refine_factor=nullnprobes=8nprobes=24, refine_factor=nullValidation
cargo fmt --all -- --checkcargo clippy --all --tests --benches -- -D warningscargo test -p lance-index vector::storage::tests::test_query_scratch_pool -- --nocapturecargo test -p lance-index vector::bq::storage::tests::test_dist_calculator_with_scratch -- --nocapturecargo test -p lance index::vector::ivf::v2::tests:: -- --nocapture