Skip to content

feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds#7014

Merged
BubbleCal merged 5 commits into
lance-format:mainfrom
gstamatakis95:feat/shared-rabitq-rotation-ivf-rq
Jun 4, 2026
Merged

feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds#7014
BubbleCal merged 5 commits into
lance-format:mainfrom
gstamatakis95:feat/shared-rabitq-rotation-ivf-rq

Conversation

@gstamatakis95

@gstamatakis95 gstamatakis95 commented May 31, 2026

Copy link
Copy Markdown
Contributor

Closes: #7012

What

Distributed IVF_RQ builds work in the Rust engine (#6359) but could not be driven from Python because the RaBitQ rotation could not be pinned across workers. Each per-fragment build generated its own random rotation, so segments rotated vectors differently, their binary codes were not comparable, and merging corrupted the index.

This adds a way to mint one rotation, broadcast it, and reuse it in every per-fragment build, mirroring how pq_codebook is injected.

Changes

  • Add build_rq_rotation(dimension, num_bits=1, rotation_type="fast", dtype="float32") that returns one rotation as a JSON string.
  • Add an rq_rotation parameter to create_index_uncommitted, parsed into a new transient RQBuildParams.rotation field and consumed by RabitQuantizer::build.
  • build() reuses the supplied rotation instead of generating a random one, after validating num_bits, code_dim, and the signs length.

Notes

  • Only the fast rotation is supported because its sign vector is JSON serializable.
  • The matrix rotation keeps a dense matrix in a binary buffer that the JSON wire format drops, so it is rejected with a clear error.
  • The params proto, the segment builder, and the merge and commit paths are unchanged.

Tests

  • Rust unit tests for shared-rotation reuse, identical codes across builds, mismatch and bad-input rejection, and the matrix-via-JSON rejection.
  • A Python integration test that builds two IVF_RQ segments on separate fragments with one shared rotation, merges, commits, and queries.

@github-actions github-actions Bot added enhancement New feature or request A-python Python bindings labels May 31, 2026
@github-actions

Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@gstamatakis95 gstamatakis95 changed the title feat(python): Add support for shared RaBitQ rotation for IVF_RQ feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds May 31, 2026
@gstamatakis95 gstamatakis95 force-pushed the feat/shared-rabitq-rotation-ivf-rq branch from a8438c2 to a5bd8a8 Compare May 31, 2026 11:00
@gstamatakis95 gstamatakis95 marked this pull request as ready for review May 31, 2026 11:04

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@BubbleCal BubbleCal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some naming comments

This is nice and is what on our roadmap! Thanks for the contribution!

def build_rq_rotation(
dimension: int,
num_bits: int = 1,
rotation_type: str = "fast",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this
rotation_type is an internal param for compatibilty

Comment thread python/python/lance/dataset.py Outdated
*,
target_partition_size: Optional[int] = None,
skip_transpose: bool = False,
rq_rotation: Optional[str] = None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to rabitq_model?

pq_codebook: pa.Array,
dst_uri: str,
): ...
def build_rq_rotation(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to build_rq_model

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.27559% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/vector/bq/builder.rs 96.63% 3 Missing and 1 partial ⚠️
rust/lance-index/src/vector/bq.rs 66.66% 1 Missing ⚠️
rust/lance/src/index/vector.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@gstamatakis95 gstamatakis95 force-pushed the feat/shared-rabitq-rotation-ivf-rq branch from 873329a to c010e39 Compare June 1, 2026 20:05
@gstamatakis95

Copy link
Copy Markdown
Contributor Author

@BubbleCal I believe we are ready.

@github-actions github-actions Bot added the A-index Vector index, linalg, tokenizer label Jun 2, 2026
@gstamatakis95

gstamatakis95 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Fixed linting @BubbleCal. Sorry for the delay

@gstamatakis95

Copy link
Copy Markdown
Contributor Author

@BubbleCal Apologies for the spam, resolved conflicts, tests are passing and overall it looks OK. I don't know if I can trigger any of the workflows myself so please let me know.

@BubbleCal

Copy link
Copy Markdown
Contributor

@BubbleCal Apologies for the spam, resolved conflicts, tests are passing and overall it looks OK. I don't know if I can trigger any of the workflows myself so please let me know.

No worries, it's a limitation that first-time contributor can't trigger the CI

Will keep eyes on it

@gstamatakis95

Copy link
Copy Markdown
Contributor Author

Perfect, unsure if I need to do something specific so I'm just waiting for now

@BubbleCal

Copy link
Copy Markdown
Contributor

@gstamatakis95 Thanks for the contribution!

@BubbleCal BubbleCal merged commit 38d289d into lance-format:main Jun 4, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributed IVF_RQ builds have no way of sharing the RaBitQ rotation across workers in Python

2 participants