Add composite index main_detection(occurrence_id, timestamp DESC)#1251
Add composite index main_detection(occurrence_id, timestamp DESC)#1251mihow wants to merge 1 commit into
Conversation
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a composite ChangesDetection timestamp index
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a PostgreSQL composite index on main_detection (occurrence_id, timestamp DESC) to improve performance of the /captures/ hot-path query patterns that join detections to occurrences and sort by newest detection timestamp.
Changes:
- Add a composite ordered index definition to the
Detectionmodel (occurrence,-timestamp). - Add a non-atomic migration that creates the index using
CREATE INDEX CONCURRENTLYviaAddIndexConcurrently.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
ami/main/models.py |
Declares the new composite index on Detection to support join + timestamp DESC ordering patterns. |
ami/main/migrations/0084_index_detection_occurrence_timestamp.py |
Applies the index online with AddIndexConcurrently and atomic = False for production safety on a large table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review this please if you haven't already |
|
✅ Actions performedReview triggered.
|
|
I have tested locally on a large dataset, will test in staging & demo env next |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ami/main/migrations/0084_index_detection_occurrence_timestamp.py (1)
1-21: Concurrent index migration looks correct; consider documenting recovery for invalid-index failures.
AddIndexConcurrently+atomic = Falseis the right approach for a largemain_detectiontable to avoid anACCESS EXCLUSIVElock. Index definition matches the modelMeta.indexesentry (same name and field order), somakemigrations --checkshould remain clean.Operational note (per the PR description): if
CREATE INDEX CONCURRENTLYfails mid-build, Postgres leaves an INVALID index that must be dropped manually (DROP INDEX CONCURRENTLY detection_occurrence_ts_desc;) before the migration can be retried — otherwise the retry will fail on the duplicate relation. Worth calling out in the deploy runbook, and verifying on staging before prod.Also, Ruff RUF012 warnings on
dependencies/operationsare false positives — these are standard Django migration class attributes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/migrations/0084_index_detection_occurrence_timestamp.py` around lines 1 - 21, The migration uses AddIndexConcurrently with atomic = False to avoid locks, but you must document operational recovery for a failed CONCURRENTLY build: add a brief comment in the migration (near AddIndexConcurrently / atomic = False) and/or update the deploy runbook explaining that a mid-build failure leaves an INVALID index named detection_occurrence_ts_desc and operators must run DROP INDEX CONCURRENTLY detection_occurrence_ts_desc before retrying the migration; you can also note that Ruff RUF012 warnings on dependencies/operations are false positives so no code change is needed for those.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ami/main/migrations/0084_index_detection_occurrence_timestamp.py`:
- Around line 1-21: The migration uses AddIndexConcurrently with atomic = False
to avoid locks, but you must document operational recovery for a failed
CONCURRENTLY build: add a brief comment in the migration (near
AddIndexConcurrently / atomic = False) and/or update the deploy runbook
explaining that a mid-build failure leaves an INVALID index named
detection_occurrence_ts_desc and operators must run DROP INDEX CONCURRENTLY
detection_occurrence_ts_desc before retrying the migration; you can also note
that Ruff RUF012 warnings on dependencies/operations are false positives so no
code change is needed for those.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8493ad21-4fe2-429b-aac0-38fddbd13b9e
📒 Files selected for processing (2)
ami/main/migrations/0084_index_detection_occurrence_timestamp.pyami/main/models.py
|
Claude says: Worth a bit more verification before merge — index itself is low-risk and the migration setup (
Not blocking — just want to make sure the merge buys us what the report claims before we move on. (For context on #1271: that's a separate N+1 angle on |
Adds a composite index to speed up the /captures/ list endpoint, which joins main_detection and main_occurrence, filters by project_id, and sorts by timestamp DESC. Without this index PostgreSQL falls back to a nested-loop join scanning ~517k detection rows before filtering. Measured impact (per external perf report): Before: ~1,470 ms (nested loop, full timestamp sort) After: ~251 ms (parallel hash join on the filtered result) ~83% reduction on cold cache. Uses django.contrib.postgres.operations.AddIndexConcurrently with atomic=False so the index builds without holding a table-level write lock — safe for online prod deployment on a table this size. Co-Authored-By: Claude <noreply@anthropic.com>
a185a4a to
c7fd9b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
ami/main/models.py (1)
2767-2776: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winDrop the duplicate thumbnail index.
The
UniqueConstrainton(source_image, label)already creates the lookup index for this table, so this extramodels.Index(...)just adds write/storage overhead.♻️ Suggested fix
constraints = [ models.UniqueConstraint( fields=["source_image", "label"], name="unique_source_image_thumbnail_label", ), ] - indexes = [ - models.Index(fields=["source_image", "label"]), - ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/models.py` around lines 2767 - 2776, The Meta definition for the thumbnail model has a redundant index on the same fields covered by unique_source_image_thumbnail_label; remove the extra models.Index(fields=["source_image", "label"]) from the class Meta indexes while keeping the UniqueConstraint in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py`:
- Around line 34-49: The timeout handling around AddIndexConcurrently is
one-sided; make the guard symmetric by changing the RunSQL in
0093_index_detection_occurrence_timestamp so the reverse path also disables the
statement timeout before the concurrent index drop. Update the migrations.RunSQL
block that currently uses RESET statement_timeout as reverse_sql so the rollback
direction matches the forward setup/teardown behavior and keeps the session
timeout state safe for DROP INDEX CONCURRENTLY.
In `@ami/main/models.py`:
- Around line 2443-2458: thumbnail_is_valid currently only compares the
thumbnail width, so spec changes that affect the rendered output can be missed;
update the validity check to account for the full thumbnail spec, not just
width. Use thumbnail_is_valid in ami/main/models.py as the entry point, and
compare all output-affecting fields from spec (including height and any other
generator inputs) before returning true. Keep the existing
path/last_modified/source change checks, but ensure a spec mismatch always
forces regeneration.
---
Nitpick comments:
In `@ami/main/models.py`:
- Around line 2767-2776: The Meta definition for the thumbnail model has a
redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca52d1f6-4576-456a-b80e-faa352c93437
📒 Files selected for processing (2)
ami/main/migrations/0093_index_detection_occurrence_timestamp.pyami/main/models.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🧹 Nitpick comments (1)
ami/main/models.py (1)
2767-2776: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winDrop the duplicate thumbnail index.
The
UniqueConstrainton(source_image, label)already creates the lookup index for this table, so this extramodels.Index(...)just adds write/storage overhead.♻️ Suggested fix
constraints = [ models.UniqueConstraint( fields=["source_image", "label"], name="unique_source_image_thumbnail_label", ), ] - indexes = [ - models.Index(fields=["source_image", "label"]), - ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/models.py` around lines 2767 - 2776, The Meta definition for the thumbnail model has a redundant index on the same fields covered by unique_source_image_thumbnail_label; remove the extra models.Index(fields=["source_image", "label"]) from the class Meta indexes while keeping the UniqueConstraint in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py`:
- Around line 34-49: The timeout handling around AddIndexConcurrently is
one-sided; make the guard symmetric by changing the RunSQL in
0093_index_detection_occurrence_timestamp so the reverse path also disables the
statement timeout before the concurrent index drop. Update the migrations.RunSQL
block that currently uses RESET statement_timeout as reverse_sql so the rollback
direction matches the forward setup/teardown behavior and keeps the session
timeout state safe for DROP INDEX CONCURRENTLY.
In `@ami/main/models.py`:
- Around line 2443-2458: thumbnail_is_valid currently only compares the
thumbnail width, so spec changes that affect the rendered output can be missed;
update the validity check to account for the full thumbnail spec, not just
width. Use thumbnail_is_valid in ami/main/models.py as the entry point, and
compare all output-affecting fields from spec (including height and any other
generator inputs) before returning true. Keep the existing
path/last_modified/source change checks, but ensure a spec mismatch always
forces regeneration.
---
Nitpick comments:
In `@ami/main/models.py`:
- Around line 2767-2776: The Meta definition for the thumbnail model has a
redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca52d1f6-4576-456a-b80e-faa352c93437
📒 Files selected for processing (2)
ami/main/migrations/0093_index_detection_occurrence_timestamp.pyami/main/models.py
🛑 Comments failed to post (2)
ami/main/migrations/0093_index_detection_occurrence_timestamp.py (1)
34-49: 🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
python3 - <<'PY' import sys try: import django print(f"Django version: {django.VERSION}") # Try to locate the source of AddIndexConcurrently # Note: In standard Django, there isn't a public `AddIndexConcurrently` in `migrations.operations.fields` # It might be a custom class or from a specific Django version that needs checking. # Let's check if it exists and where it imports from in the specific file. pass except ImportError: print("Django not available in this environment.") # Since we cannot rely on installed Django to be the exact version of the repo, # and we need to know the *actual* behavior in this codebase, # let's try to read the file to see if it's a custom class defined elsewhere or imported. # However, usually `AddIndexConcurrently` is not standard until very recent versions or custom. # Let's search for the definition. PYRepository: RolnickLab/antenna
Length of output: 198
🏁 Script executed:
rg -A 20 "class AddIndexConcurrently" --type pyRepository: RolnickLab/antenna
Length of output: 156
🏁 Script executed:
grep -r "AddIndexConcurrently" --include="*.py" . | head -20Repository: RolnickLab/antenna
Length of output: 830
Make the timeout guard symmetric on rollback.
The current migration disables the statement timeout only for the forward direction. If the migration fails and Django rolls back, the
DROP INDEX CONCURRENTLYwill execute without the timeout reset, risking a failure if the index is large or the session has a strict timeout. Swap thereverse_sqlarguments to ensure the timeout is disabled before the index drop during rollback.migrations.RunSQL( sql="SET statement_timeout = 0;", - reverse_sql=migrations.RunSQL.noop, + reverse_sql="RESET statement_timeout;", ), AddIndexConcurrently( model_name="detection", index=models.Index( fields=["occurrence", "-timestamp"], name="detection_occurrence_ts_desc", ), ), migrations.RunSQL( sql="RESET statement_timeout;", - reverse_sql=migrations.RunSQL.noop, + reverse_sql="SET statement_timeout = 0;", ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.migrations.RunSQL( sql="SET statement_timeout = 0;", reverse_sql="RESET statement_timeout;", ), AddIndexConcurrently( model_name="detection", index=models.Index( fields=["occurrence", "-timestamp"], name="detection_occurrence_ts_desc", ), ), # Restore the session default so the disabled timeout does not leak into # later migrations executed on the same (non-atomic) connection. migrations.RunSQL( sql="RESET statement_timeout;", reverse_sql="SET statement_timeout = 0;", ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py` around lines 34 - 49, The timeout handling around AddIndexConcurrently is one-sided; make the guard symmetric by changing the RunSQL in 0093_index_detection_occurrence_timestamp so the reverse path also disables the statement timeout before the concurrent index drop. Update the migrations.RunSQL block that currently uses RESET statement_timeout as reverse_sql so the rollback direction matches the forward setup/teardown behavior and keeps the session timeout state safe for DROP INDEX CONCURRENTLY.ami/main/models.py (1)
2443-2458: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Invalidate thumbnails when the spec changes.
thumbnail_is_valid()only checkswidth, so a thumbnail spec update that changes the generated height (or any other output-affecting setting) will still be treated as warm and never regenerated.♻️ Suggested fix
- if thumb is None or not thumb.path or thumb.width != spec["width"]: + expected_height = spec.get("height") + if thumb is None or not thumb.path or thumb.width != spec["width"]: return False + if expected_height is not None and thumb.height != expected_height: + return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def thumbnail_is_valid(self, spec: dict, thumb: "SourceImageThumbnail | None") -> bool: """Whether ``thumb`` satisfies ``spec`` and need not be regenerated. ``thumb.width`` stores the requested spec width (see the generator), so the comparison is strict equality; legacy encoder-width rows read invalid and self-heal on next generation. A None ``last_modified`` on either side means "no signal of change" (matches ``NULL < x`` → ``False`` in SQL). """ expected_height = spec.get("height") if thumb is None or not thumb.path or thumb.width != spec["width"]: return False if expected_height is not None and thumb.height != expected_height: return False source_changed = ( self.last_modified is not None and thumb.last_modified is not None and thumb.last_modified < self.last_modified ) return not source_changed🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/models.py` around lines 2443 - 2458, thumbnail_is_valid currently only compares the thumbnail width, so spec changes that affect the rendered output can be missed; update the validity check to account for the full thumbnail spec, not just width. Use thumbnail_is_valid in ami/main/models.py as the entry point, and compare all output-affecting fields from spec (including height and any other generator inputs) before returning true. Keep the existing path/last_modified/source change checks, but ensure a spec mismatch always forces regeneration.
|
Claude says: Recommending we close this one, based on a measurement pass on a production-snapshot dataset (part of the #928 slow-query work). Summary of what we found: The index doesn't help the query it's aimed at, at our data distribution. On the snapshot, detections-per-occurrence is averaging ~1.0 (p95 = 1, max 37), so there is essentially nothing to sort per occurrence. On the expensive occurrence aggregate the planner ignored the index (626 ms → 643 ms, within noise). Where it does help — first/last-appearance subqueries — the win is ~13 ms → ~2 ms over a couple hundred occurrences, i.e. single-digit milliseconds in absolute terms. It adds write cost on the hottest insert path. Heads-up on the description: the PR/commit body cites a Net: at current scale this is redundant. If we later get projects where occurrences genuinely accumulate many detections each, the first/last-appearance case could justify it — but the data isn't there today. For context: the occurrence-list slowness that is real turned out to be a missing |
Summary
Add a composite index on
main_detection (occurrence_id, timestamp DESC)to speed up the/captures/list endpoint, which joinsmain_detectionandmain_occurrence, filters byproject_id, and sorts by timestamp DESC.Without this index PostgreSQL falls back to a nested-loop join that scans ~517k detection rows before filtering by project, then sorts the full set.
Measured impact (external perf report, warm DB, caching disabled)
This is a straight-up plan change: the query planner picks a parallel hash join because it can read detections ordered by timestamp for each occurrence directly from the index, instead of collecting all qualifying detections and sorting.
Implementation
ami/main/models.py— addMeta.indexes = [Index(fields=[\"occurrence\", \"-timestamp\"], name=\"detection_occurrence_ts_desc\")]onDetection.ami/main/migrations/0084_index_detection_occurrence_timestamp.py— usesdjango.contrib.postgres.operations.AddIndexConcurrentlywithatomic = False.Verified locally that the migration emits the correct SQL:
…and the resulting index is visible on the table:
Why AddIndexConcurrently
main_detectionis one of the largest tables in the system. A plainCREATE INDEXtakes anACCESS EXCLUSIVE-level table lock while it builds, blocking writes for the duration.CREATE INDEX CONCURRENTLYtrades a longer build for an online build that only needs aSHARE UPDATE EXCLUSIVElock — safe during production traffic.Gotcha:
AddIndexConcurrentlycan't run inside a transaction, so the migration setsatomic = False. If the build fails partway, the index is left in an invalid state and must be dropped manually (DROP INDEX CONCURRENTLY) before re-running — standard for concurrent index builds.Test plan
python manage.py migrate mainapplies cleanly in CI stacksqlmigrateemitsCREATE INDEX CONCURRENTLY\\d main_detectionshows the composite indexEXPLAIN (ANALYZE, BUFFERS)on/captures/query with a large project, compare plan before/afterRelated
Pairs with #1250 (require
project_idon hot list endpoints) as the second half of the "fix the major culprits now" pass. Neither depends on the other — can merge independently.🤖 Generated with Claude Code
Summary by CodeRabbit
Performance
Bug Fixes