Skip to content

feat: Index Migrator#583

Merged
nkanu17 merged 80 commits into
mainfrom
feat/index-migrator-pre-release
Jun 4, 2026
Merged

feat: Index Migrator#583
nkanu17 merged 80 commits into
mainfrom
feat/index-migrator-pre-release

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 14, 2026

feat: Index Migrator (Pre-release)

Zero-downtime, crash-safe index migration for RedisVL. Plan, apply, and rollback schema changes, including vector quantization, field renames, prefix changes, and algorithm swaps, through a single CLI or programmatic API.

Summary

This PR adds a complete index migration system to RedisVL, enabling users to evolve their index schemas without data loss. The migrator handles the full lifecycle: plan → review → apply → validate → rollback.

Key Capabilities

Category Operations
Index-only Change algorithm (FLAT ↔ HNSW ↔ SVS-VAMANA), distance metric, HNSW params (M, EF_CONSTRUCTION), make fields sortable
Schema + Data Add/remove fields, rename fields, rename index, change key prefix, change field options (separator, stemming)
Vector Quantization float32 → float16, bfloat16, int8, uint8 with automatic re-encoding and crash-safe backup

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CLI: rvl migrate                          │
│  wizard │ plan │ apply │ rollback │ estimate │ validate     │
├─────────┴──────┴───────┴──────────┴──────────┴──────────────┤
│                  MigrationPlanner                           │
│  Schema diffing, change classification, plan generation     │
├─────────────────────────────────────────────────────────────┤
│            MigrationExecutor (sync + async)                 │
│  enumerate → dump → drop → key-renames → quantize →        │
│  create → index → validate                                  │
├─────────────────────────────────────────────────────────────┤
│  VectorBackup    │  Quantize Pipeline  │  Dtype Helpers     │
│  Crash-safe dump │  Batched R/W/Convert│  Width / detection │
└──────────────────┴─────────────────────┴────────────────────┘

Usage

CLI: Interactive Wizard

# List available indexes
rvl index listall --url redis://localhost:6379

# Interactive wizard: walks through changes step by step
rvl migrate wizard --url redis://localhost:6379

# Generate a plan from a target schema YAML
rvl migrate plan --index my_index --target-schema target_schema.yaml --url redis://localhost:6379

# Apply with crash-safe backup and multi-worker quantization
rvl migrate apply --plan migration_plan.yaml \
  --backup-dir /tmp/migration_backup \
  --workers 4 --batch-size 1000 \
  --url redis://localhost:6379

# Estimate disk space (dry-run, no mutations)
rvl migrate estimate --plan migration_plan.yaml --url redis://localhost:6379

# Rollback if needed
rvl migrate rollback --backup-dir /tmp/migration_backup \
  --index my_index --url redis://localhost:6379

# Validate post-migration
rvl migrate validate --plan migration_plan.yaml --url redis://localhost:6379

CLI: Batch Migration (Multiple Indexes)

# Plan: shared schema-patch applied to indexes matched by pattern,
# explicit --indexes list, or --indexes-file
rvl migrate batch-plan --schema-patch shared_patch.yaml \
  --pattern '*_idx' --url redis://localhost:6379

rvl migrate batch-apply --plan batch_plan.yaml \
  --state batch_state.yaml --url redis://localhost:6379

rvl migrate batch-status --state batch_state.yaml

Programmatic API

from redisvl.migration import MigrationPlanner, MigrationExecutor

# Plan (provide either schema_patch_path or target_schema_path)
planner = MigrationPlanner()
plan = planner.create_plan(
    index_name="my_index",
    target_schema_path="target_schema.yaml",
    redis_url="redis://localhost:6379",
)

# Apply
executor = MigrationExecutor()
report = executor.apply(
    plan,
    redis_url="redis://localhost:6379",
    backup_dir="/tmp/migration_backup",
    num_workers=4,
    batch_size=1000,
)

print(f"Result: {report.result}")  # "succeeded"
print(f"Duration: {report.timings.total_migration_duration_seconds}s")

Async API

from redisvl.migration import AsyncMigrationExecutor

executor = AsyncMigrationExecutor()
report = await executor.apply(
    plan,
    redis_url="redis://localhost:6379",
    backup_dir="/tmp/migration_backup",
    num_workers=4,
)

Crash Safety & Resume

Quantization migrations always write a vector backup to disk before mutating data. When --backup-dir is omitted, the executor auto-defaults to ./migration_backups.

  1. Before drop: Original vectors are dumped to a binary backup file on disk
  2. On crash: Re-running the same command detects the backup and resumes from the last completed batch
  3. Rollback: rvl migrate rollback restores original vectors from the backup at any time

The backup file tracks phase (dump → ready → active → completed) and batch progress, so resume skips already-completed work. Backups are retained on disk after success for audit/rollback; cleanup is a manual step.

Performance

  • Pipelined reads/writes: Batch HGET/HSET operations (configurable --batch-size)
  • Multi-worker quantization: --workers N parallelizes vector re-encoding via ThreadPoolExecutor (sync) or asyncio.gather (async)
  • Redis Cluster support: Batched DUMP/RESTORE/DEL for cross-slot key renames (100 keys/pipeline)
  • Disk space estimation: rvl migrate estimate calculates RDB + AOF impact before any mutations

What's Blocked

Change Why Workaround
Change vector dimensions Requires re-embedding Re-embed with new model, reload data
Change storage type (hash ↔ JSON) Different data format Export, transform, reload
Add a new vector field Requires vectors for all docs Add vectors first, then migrate

New Files

redisvl/migration/ (core module)

File Description
models.py Data models: MigrationPlan, MigrationReport, MigrationTimings, etc.
planner.py Schema diffing, change classification, plan generation
executor.py Sync migration executor, full apply lifecycle
async_executor.py Async migration executor
async_planner.py Async planner
validation.py / async_validation.py Pre/post-migration validation
backup.py VectorBackup: crash-safe binary backup format
quantize.py Pipelined vector quantization + multi-worker orchestration
reliability.py Dtype helpers: width comparison, vector-dtype detection, idempotent-quantize check
wizard.py Interactive migration wizard
batch_planner.py / batch_executor.py Multi-index batch migration
utils.py Shared utilities (disk estimation, key enumeration, etc.)

redisvl/cli/migrate.py

CLI with 11 subcommands: helper, wizard, plan, apply, estimate, rollback, validate, batch-plan, batch-apply, batch-resume, batch-status

Tests

Suite Tests Description
test_migration_planner.py 23 Schema diffing, change classification
test_migration_wizard.py 45 Interactive wizard, adversarial inputs
test_vector_backup.py 26 Backup create/load/resume/rollback/cleanup
test_pipeline_quantize.py 16 Pipelined read/write/convert
test_executor_backup_quantize.py 13 Executor backup integration
test_multi_worker_quantize.py 15 Multi-worker, resume, dtype scaling
test_async_migration_executor.py 35 Async executor
test_async_migration_planner.py 4 Async planner
test_batch_migration.py 42 Batch planner/executor, overlap detection
Integration tests 5 files Full end-to-end with live Redis

Total: 219 unit tests collected, all pre-commit checks clean.

Review Notes

  • This is a pre-release: API surface is stable but may evolve based on feedback
  • 8 rounds of automated code review (nkode-review) have been applied, addressing correctness, security, performance, and backward compatibility findings
  • The branch includes removal of the MCP module (previously merged separately). Those deletions are unrelated to the migrator
  • Documentation is in docs/user_guide/how_to_guides/migrate-indexes.md and docs/concepts/index-migrations.md

Note

High Risk
Introduces a large new migration subsystem that can drop/recreate indexes, rename keys/fields, and re-encode vector data; mistakes or edge cases can cause downtime or irreversible data changes despite backups/resume safeguards.

Overview
Adds a new experimental index migrator exposed via rvl migrate (wizard/plan/apply/estimate/rollback/validate plus batch-plan/apply/resume/status) and a new redisvl.migration package exporting sync/async planners, executors, validators, batch tooling, and models.

Implements an async migration executor that supports drop/recreate migrations with key enumeration (FT.AGGREGATE w/ SCAN fallback), field renames (hash/JSON), prefix renames (standalone and cluster-safe), and vector datatype changes; quantization now enforces mandatory on-disk backups (auto-defaulting to ./migration_backups) to enable crash-safe resume and rollback.

Expands documentation substantially: adds a CLI reference page, new migration concept/how-to guides, and updates existing docs/notebooks to include migrate commands; also updates .gitignore, CLAUDE.md, and CLI connection option helpers.

Copilot AI review requested due to automatic review settings April 14, 2026 05:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a pre-release RedisVL index migration system (sync + async) with planning, execution, validation, batch workflows, and extensive documentation/tests to enable crash-safe, document-preserving schema evolution.

Changes:

  • Introduces migration core modules (planner/executor/validator/backup/quantize/reliability) plus batch migration support.
  • Adds async equivalents and a new rvl migrate CLI entry point + docs.
  • Adds comprehensive unit/integration tests and benchmark/e2e helper scripts.

Reviewed changes

Copilot reviewed 45 out of 53 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/test_async_migration_planner.py Adds async planner unit coverage mirroring sync planner tests.
tests/unit/test_async_migration_executor.py Adds async executor + disk space estimator unit tests and dtype-detection tests.
tests/integration/test_migration_v1.py End-to-end integration test for sync plan/apply/validate flow.
tests/integration/test_migration_routes.py Integration coverage for supported migration “routes” (algo/metric/dtype/params).
tests/integration/test_field_modifier_ordering_integration.py Adds integration tests for new/related field modifiers (INDEXEMPTY/UNF/NOINDEX).
tests/integration/test_batch_migration_integration.py Adds integration tests for batch plan/apply/resume/progress callback.
tests/integration/test_async_migration_v1.py End-to-end integration test for async plan/apply/validate flow.
tests/benchmarks/visualize_results.py Adds benchmark visualization script for retrieval/memory/latency charts.
scripts/verify_data_correctness.py Adds manual script to verify float32→float16 migration correctness.
scripts/test_migration_e2e.py Adds large-scale e2e migration benchmark script.
scripts/test_crash_resume_e2e.py Adds crash/resume robustness test script for quantization checkpointing.
redisvl/redis/connection.py Enhances vector attribute parsing to include HNSW params (m, ef_construction).
redisvl/migration/validation.py Adds sync migration validation (schema/doc counts/key samples/functional checks).
redisvl/migration/utils.py Adds YAML helpers, schema canonicalization, readiness polling, disk estimation utilities.
redisvl/migration/reliability.py Adds crash-safety utilities: dtype detection, checkpointing, BGSAVE helpers, undo buffer.
redisvl/migration/quantize.py Adds pipelined (and multi-worker) quantization for vector dtype conversions.
redisvl/migration/models.py Adds migration/batch models and disk space estimate models/helpers.
redisvl/migration/batch_planner.py Adds batch planner for applying a shared patch across many indexes.
redisvl/migration/batch_executor.py Adds batch executor with checkpointing/resume and reporting.
redisvl/migration/backup.py Adds crash-safe on-disk backup format for vector dumps and resume.
redisvl/migration/async_validation.py Adds async validator parity with sync validation checks.
redisvl/migration/async_planner.py Adds async planner wrapper over sync diff/classification logic.
redisvl/migration/init.py Exposes new migration/batch APIs at package boundary.
redisvl/cli/utils.py Fixes redis URL scheme building and refactors CLI option helpers.
redisvl/cli/main.py Wires in new migrate CLI command group.
docs/user_guide/index.md Adds migration to user guide landing page highlights.
docs/user_guide/how_to_guides/index.md Adds “Migrate an Index” how-to link and toctree entry.
docs/user_guide/cli.ipynb Updates CLI notebook with rvl migrate commands and reorganizes connection section.
docs/concepts/search-and-indexing.md Updates concept docs to point to new migration workflow and docs.
docs/concepts/index.md Adds “Index Migrations” concept card and toctree entry.
docs/concepts/index-migrations.md New concept doc describing migration modes, supported changes, and sync/async behavior.
docs/concepts/field-attributes.md Expands vector datatype docs + migration support notes for modifiers.
docs/api/cli.rst Adds a full CLI reference including rvl migrate command group.
CLAUDE.md Adds protected directory note (local_docs/).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread redisvl/migration/validation.py Outdated
Comment thread redisvl/migration/async_validation.py Outdated
Comment thread redisvl/migration/validation.py Outdated
Comment thread redisvl/migration/async_validation.py Outdated
Comment thread redisvl/migration/quantize.py
Comment thread redisvl/migration/quantize.py Outdated
Comment thread tests/unit/test_async_migration_executor.py
@nkanu17 nkanu17 force-pushed the feat/index-migrator-pre-release branch from 3fe4972 to 3f03bb7 Compare April 14, 2026 05:49
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Apr 14, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copilot AI review requested due to automatic review settings May 1, 2026 15:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@nkanu17 nkanu17 added auto:release Create a release when this PR is merged auto:minor Increment the minor version when merged labels May 1, 2026
@nkanu17 nkanu17 requested a review from tylerhutcherson May 1, 2026 17:03
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented May 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5671e6fe69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread redisvl/migration/executor.py
Comment thread redisvl/migration/async_executor.py
Copilot AI review requested due to automatic review settings May 1, 2026 18:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@nkanu17 nkanu17 requested a review from abrookins May 1, 2026 18:48
@nkanu17 nkanu17 force-pushed the feat/index-migrator-pre-release branch from 01d3025 to 037c2d8 Compare May 1, 2026 19:20
@nkanu17 nkanu17 requested review from Copilot and vishal-bala May 7, 2026 22:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@nkanu17 nkanu17 marked this pull request as ready for review May 12, 2026 19:38
Comment thread redisvl/cli/migrate.py
Comment thread redisvl/cli/migrate.py
Comment thread docs/user_guide/cli.ipynb
nkanu17 added 10 commits June 3, 2026 22:50
…, validator, and utilities

Adds the core data structures and planning engine for the Index Migrator:
- models.py: Pydantic models for MigrationPlan, DiffClassification, ValidationResult, MigrationReport
- planner.py: MigrationPlanner with schema introspection, diffing, and change classification
- validation.py: MigrationValidator for post-migration checks
- utils.py: shared helpers for YAML I/O, disk estimation, index listing, timestamps
- connection.py: HNSW parameter extraction for schema introspection
- 15 unit tests for planner logic
- Fix import ordering in utils.py (isort compliance)
- Simplify validation prefix key rewriting to mirror executor logic
- Normalize single-element list prefixes in normalize_target_schema_to_patch
Adds the migration executor and CLI subcommands for plan/apply/validate:
- executor.py: MigrationExecutor with sync apply, key enumeration, index drop/create, quantization, field/key rename
- reliability.py: BatchUndoBuffer, QuantizationCheckpoint, BGSAVE helpers
- cli/migrate.py: CLI with plan, apply, validate, list, helper, estimate subcommands
- cli/main.py: register migrate command
- cli/utils.py: add_redis_connection_options helper
- Integration tests for comprehensive migration, v1, routes, and field modifier ordering
- Fix CLI step labels to match executor order
- Fix GEO coordinates to lat,lon order in integration tests
- Move JSON path to top-level field property in tests
- Use sys.exit() instead of exit() in CLI
- Use transaction=False for quantize pipeline
Adds guided migration builder for interactive plan creation:
- wizard.py: MigrationWizard with index selection, field operations, vector tuning, quantization, and preview
- cli/migrate.py: adds 'wizard' subcommand (rvl migrate wizard --index <name>)
- Unit tests for wizard logic (41 tests)
- Improve field removal to clean up renames by both old_name and new_name
- Resolve update names through rename map in working schema preview
- Add multi-prefix guard to reject indexes with multiple prefixes
- Fix dependent prompts (UNF, no_index) when field is already sortable
- Pass existing field attrs to common attrs prompts for update mode
…CLI flag

Adds non-blocking async migration support:
- async_executor.py: AsyncMigrationExecutor with async apply, BGSAVE, quantization
- async_planner.py: AsyncMigrationPlanner with async create_plan
- async_validation.py: AsyncMigrationValidator with async validate
- async_utils.py: async Redis helpers
- cli/migrate.py: adds --async flag to 'apply' subcommand
- Unit tests for async executor and planner
- Fix SVS client leak in async_planner check_svs_requirements
- Remove dead async_utils.py (functions duplicated in async_executor)
…ands

- batch_planner.py: multi-index plan generation with pattern/list support
- batch_executor.py: checkpointed batch execution with resume capability
- CLI: batch-plan, batch-apply, batch-resume, batch-status subcommands
- 32 unit tests for batch migration logic
… CLI

- Refactor _check_index_applicability to return Tuple[BatchIndexEntry, bool]
  where bool indicates quantization, avoiding redundant create_plan_from_patch
- Replace exit(1) with sys.exit(1) in batch-apply and batch-resume CLI commands
- Sanitize report filenames (colons to underscores) for Windows compat
Comment thread redisvl/migration/executor.py
Copilot AI review requested due to automatic review settings June 4, 2026 04:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@nkanu17 nkanu17 removed auto:release Create a release when this PR is merged auto:minor Increment the minor version when merged labels Jun 4, 2026
Copilot AI review requested due to automatic review settings June 4, 2026 04:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@nkanu17 nkanu17 changed the title feat: Index Migrator (Pre-release) feat: Index Migrator Jun 4, 2026
Comment thread redisvl/migration/async_planner.py
Comment thread redisvl/cli/migrate.py
Copilot AI review requested due to automatic review settings June 4, 2026 05:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Copilot AI review requested due to automatic review settings June 4, 2026 05:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3865a72. Configure here.

plan.merged_target_schema, field_rename.new_name
)
if not old_path or not new_path or old_path == new_path:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON field rename skipped silently

Medium Severity

For JSON storage, a field rename runs only when both old and new JSON paths resolve via get_schema_field_path. If either path is missing, the executor continues without error while the migration still drops and recreates the index, leaving documents under the old paths and breaking the new schema.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3865a72. Configure here.

@nkanu17 nkanu17 added auto:release Create a release when this PR is merged auto:minor Increment the minor version when merged and removed auto:minor Increment the minor version when merged labels Jun 4, 2026
@nkanu17 nkanu17 merged commit 4e568d9 into main Jun 4, 2026
101 of 104 checks passed
@applied-ai-release-bot
Copy link
Copy Markdown

🚀 PR was released in v0.20.0 🚀

@applied-ai-release-bot applied-ai-release-bot Bot added the released This issue/pull request has been released. label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto:minor Increment the minor version when merged auto:release Create a release when this PR is merged released This issue/pull request has been released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants