Skip to content

Add composite index main_detection(occurrence_id, timestamp DESC)#1251

Closed
mihow wants to merge 1 commit into
mainfrom
perf/index-detection-occurrence-timestamp
Closed

Add composite index main_detection(occurrence_id, timestamp DESC)#1251
mihow wants to merge 1 commit into
mainfrom
perf/index-detection-occurrence-timestamp

Conversation

@mihow

@mihow mihow commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add a composite index on main_detection (occurrence_id, timestamp DESC) to speed up the /captures/ list endpoint, which joins main_detection and main_occurrence, filters by project_id, and sorts by timestamp DESC.

Without this index PostgreSQL falls back to a nested-loop join that scans ~517k detection rows before filtering by project, then sorts the full set.

Measured impact (external perf report, warm DB, caching disabled)

Stage Exec time vs. baseline
Before — nested loop, ~517k rows scanned ~1,470 ms baseline
After — parallel hash join + index-ordered scan ~251 ms −83%

This is a straight-up plan change: the query planner picks a parallel hash join because it can read detections ordered by timestamp for each occurrence directly from the index, instead of collecting all qualifying detections and sorting.

Implementation

  • ami/main/models.py — add Meta.indexes = [Index(fields=[\"occurrence\", \"-timestamp\"], name=\"detection_occurrence_ts_desc\")] on Detection.
  • ami/main/migrations/0084_index_detection_occurrence_timestamp.py — uses django.contrib.postgres.operations.AddIndexConcurrently with atomic = False.

Verified locally that the migration emits the correct SQL:

CREATE INDEX CONCURRENTLY \"detection_occurrence_ts_desc\"
ON \"main_detection\" (\"occurrence_id\", \"timestamp\" DESC);

…and the resulting index is visible on the table:

\"detection_occurrence_ts_desc\" btree (occurrence_id, \"timestamp\" DESC)

Why AddIndexConcurrently

main_detection is one of the largest tables in the system. A plain CREATE INDEX takes an ACCESS EXCLUSIVE-level table lock while it builds, blocking writes for the duration. CREATE INDEX CONCURRENTLY trades a longer build for an online build that only needs a SHARE UPDATE EXCLUSIVE lock — safe during production traffic.

Gotcha: AddIndexConcurrently can't run inside a transaction, so the migration sets atomic = False. If the build fails partway, the index is left in an invalid state and must be dropped manually (DROP INDEX CONCURRENTLY) before re-running — standard for concurrent index builds.

Test plan

  • python manage.py migrate main applies cleanly in CI stack
  • sqlmigrate emits CREATE INDEX CONCURRENTLY
  • Post-apply \\d main_detection shows the composite index
  • Confirm on staging before prod rollout: EXPLAIN (ANALYZE, BUFFERS) on /captures/ query with a large project, compare plan before/after

Related

Pairs with #1250 (require project_id on hot list endpoints) as the second half of the "fix the major culprits now" pass. Neither depends on the other — can merge independently.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Performance

    • Improved speed and responsiveness when viewing detection records grouped by occurrence, especially where the newest detections are shown first.
    • Queries that rely on ordering detections within each occurrence should now run more efficiently.
  • Bug Fixes

    • Reduced the chance of slow or timing-out database operations during index creation on large datasets.

Copilot AI review requested due to automatic review settings April 17, 2026 20:50
@netlify

netlify Bot commented Apr 17, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit c7fd9b3
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a3d733d675c250008998c73

@netlify

netlify Bot commented Apr 17, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit c7fd9b3
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a3d733d658e04000890fa5b

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca52d1f6-4576-456a-b80e-faa352c93437

📥 Commits

Reviewing files that changed from the base of the PR and between a185a4a and c7fd9b3.

📒 Files selected for processing (2)
  • ami/main/migrations/0093_index_detection_occurrence_timestamp.py
  • ami/main/models.py

📝 Walkthrough

Walkthrough

Adds a composite Detection index on (occurrence, -timestamp) and a non-atomic migration that creates it concurrently with statement_timeout adjusted around the operation.

Changes

Detection timestamp index

Layer / File(s) Summary
Index declaration and migration scaffold
ami/main/models.py, ami/main/migrations/0093_index_detection_occurrence_timestamp.py
Detection.Meta.indexes adds a composite index on (occurrence, -timestamp), and the migration is marked non-atomic with dependency 0092_alter_sourceimage_checksum_and_more.
Concurrent index build
ami/main/migrations/0093_index_detection_occurrence_timestamp.py
The migration runs SET statement_timeout = 0, creates detection_occurrence_ts_desc with AddIndexConcurrently, and then runs RESET statement_timeout.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I hopped by the Detection row,
and found an index in the glow.
Timestamp ears stood up in line,
while migration magic made it fine.
Thump-thump—queries dash and go!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: adding a composite index on detection occurrence and timestamp.
Description check ✅ Passed The PR description is detailed and covers summary, implementation, testing, related work, and deployment notes; only some template sections are missing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/index-detection-occurrence-timestamp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a PostgreSQL composite index on main_detection (occurrence_id, timestamp DESC) to improve performance of the /captures/ hot-path query patterns that join detections to occurrences and sort by newest detection timestamp.

Changes:

  • Add a composite ordered index definition to the Detection model (occurrence, -timestamp).
  • Add a non-atomic migration that creates the index using CREATE INDEX CONCURRENTLY via AddIndexConcurrently.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
ami/main/models.py Declares the new composite index on Detection to support join + timestamp DESC ordering patterns.
ami/main/migrations/0084_index_detection_occurrence_timestamp.py Applies the index online with AddIndexConcurrently and atomic = False for production safety on a large table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mihow

mihow commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review this please if you haven't already

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

@mihow Sure! I'll kick off a review of the PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mihow

mihow commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator Author

I have tested locally on a large dataset, will test in staging & demo env next

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ami/main/migrations/0084_index_detection_occurrence_timestamp.py (1)

1-21: Concurrent index migration looks correct; consider documenting recovery for invalid-index failures.

AddIndexConcurrently + atomic = False is the right approach for a large main_detection table to avoid an ACCESS EXCLUSIVE lock. Index definition matches the model Meta.indexes entry (same name and field order), so makemigrations --check should remain clean.

Operational note (per the PR description): if CREATE INDEX CONCURRENTLY fails mid-build, Postgres leaves an INVALID index that must be dropped manually (DROP INDEX CONCURRENTLY detection_occurrence_ts_desc;) before the migration can be retried — otherwise the retry will fail on the duplicate relation. Worth calling out in the deploy runbook, and verifying on staging before prod.

Also, Ruff RUF012 warnings on dependencies/operations are false positives — these are standard Django migration class attributes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/migrations/0084_index_detection_occurrence_timestamp.py` around
lines 1 - 21, The migration uses AddIndexConcurrently with atomic = False to
avoid locks, but you must document operational recovery for a failed
CONCURRENTLY build: add a brief comment in the migration (near
AddIndexConcurrently / atomic = False) and/or update the deploy runbook
explaining that a mid-build failure leaves an INVALID index named
detection_occurrence_ts_desc and operators must run DROP INDEX CONCURRENTLY
detection_occurrence_ts_desc before retrying the migration; you can also note
that Ruff RUF012 warnings on dependencies/operations are false positives so no
code change is needed for those.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/main/migrations/0084_index_detection_occurrence_timestamp.py`:
- Around line 1-21: The migration uses AddIndexConcurrently with atomic = False
to avoid locks, but you must document operational recovery for a failed
CONCURRENTLY build: add a brief comment in the migration (near
AddIndexConcurrently / atomic = False) and/or update the deploy runbook
explaining that a mid-build failure leaves an INVALID index named
detection_occurrence_ts_desc and operators must run DROP INDEX CONCURRENTLY
detection_occurrence_ts_desc before retrying the migration; you can also note
that Ruff RUF012 warnings on dependencies/operations are false positives so no
code change is needed for those.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8493ad21-4fe2-429b-aac0-38fddbd13b9e

📥 Commits

Reviewing files that changed from the base of the PR and between ae14c7b and a185a4a.

📒 Files selected for processing (2)
  • ami/main/migrations/0084_index_detection_occurrence_timestamp.py
  • ami/main/models.py

@mihow

mihow commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Worth a bit more verification before merge — index itself is low-risk and the migration setup (AddIndexConcurrently + atomic = False) looks right, but a couple things to confirm so we know it's actually moving the needle:

  1. Confirm the index is being picked by the real query. The benchmark in the PR shows a big plan change (nested loop → parallel hash join), but the SQL it was run against isn't shown. The /captures/ list path (SourceImageViewSet) mostly filters detections by source_image_id, not occurrence_id, so the new index probably isn't relevant there. Where it does help is the Occurrence detail prefetch (views.py:1233) which explicitly does Detection.objects.order_by("-timestamp") keyed on occurrence_id — that's a perfect match for (occurrence_id, timestamp DESC). Worth running EXPLAIN (ANALYZE, BUFFERS) on staging against whichever endpoint the original perf report came from and checking the plan actually flips after the index lands.

  2. Use Sentry to pick the next target. Performance issues on the OccurrenceViewSet and SourceImageViewSet transactions in Sentry should show which queries are dominating wall time post-index. If we're still seeing the same long spans after this index, that's a signal the plan never changed for the hot path and we need a different index (or a different fix entirely).

  3. Operational note for prod rollout. If the CREATE INDEX CONCURRENTLY fails partway, Postgres leaves an INVALID index named detection_occurrence_ts_desc and the migration can't be retried until it's dropped (DROP INDEX CONCURRENTLY detection_occurrence_ts_desc;). Worth a line in the deploy notes.

Not blocking — just want to make sure the merge buys us what the report claims before we move on.

(For context on #1271: that's a separate N+1 angle on OccurrenceViewSet.list — different fix, different code path. Mentioning it as another lever for the "slow queries" theme, not as a dependency of this PR.)

Adds a composite index to speed up the /captures/ list endpoint, which
joins main_detection and main_occurrence, filters by project_id, and
sorts by timestamp DESC. Without this index PostgreSQL falls back to a
nested-loop join scanning ~517k detection rows before filtering.

Measured impact (per external perf report):
  Before: ~1,470 ms  (nested loop, full timestamp sort)
  After:  ~251 ms    (parallel hash join on the filtered result)
  ~83% reduction on cold cache.

Uses django.contrib.postgres.operations.AddIndexConcurrently with
atomic=False so the index builds without holding a table-level write
lock — safe for online prod deployment on a table this size.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the perf/index-detection-occurrence-timestamp branch from a185a4a to c7fd9b3 Compare June 25, 2026 18:28

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ami/main/models.py (1)

2767-2776: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Drop the duplicate thumbnail index.

The UniqueConstraint on (source_image, label) already creates the lookup index for this table, so this extra models.Index(...) just adds write/storage overhead.

♻️ Suggested fix
         constraints = [
             models.UniqueConstraint(
                 fields=["source_image", "label"],
                 name="unique_source_image_thumbnail_label",
             ),
         ]
-        indexes = [
-            models.Index(fields=["source_image", "label"]),
-        ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/models.py` around lines 2767 - 2776, The Meta definition for the
thumbnail model has a redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py`:
- Around line 34-49: The timeout handling around AddIndexConcurrently is
one-sided; make the guard symmetric by changing the RunSQL in
0093_index_detection_occurrence_timestamp so the reverse path also disables the
statement timeout before the concurrent index drop. Update the migrations.RunSQL
block that currently uses RESET statement_timeout as reverse_sql so the rollback
direction matches the forward setup/teardown behavior and keeps the session
timeout state safe for DROP INDEX CONCURRENTLY.

In `@ami/main/models.py`:
- Around line 2443-2458: thumbnail_is_valid currently only compares the
thumbnail width, so spec changes that affect the rendered output can be missed;
update the validity check to account for the full thumbnail spec, not just
width. Use thumbnail_is_valid in ami/main/models.py as the entry point, and
compare all output-affecting fields from spec (including height and any other
generator inputs) before returning true. Keep the existing
path/last_modified/source change checks, but ensure a spec mismatch always
forces regeneration.

---

Nitpick comments:
In `@ami/main/models.py`:
- Around line 2767-2776: The Meta definition for the thumbnail model has a
redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca52d1f6-4576-456a-b80e-faa352c93437

📥 Commits

Reviewing files that changed from the base of the PR and between a185a4a and c7fd9b3.

📒 Files selected for processing (2)
  • ami/main/migrations/0093_index_detection_occurrence_timestamp.py
  • ami/main/models.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ami/main/models.py (1)

2767-2776: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Drop the duplicate thumbnail index.

The UniqueConstraint on (source_image, label) already creates the lookup index for this table, so this extra models.Index(...) just adds write/storage overhead.

♻️ Suggested fix
         constraints = [
             models.UniqueConstraint(
                 fields=["source_image", "label"],
                 name="unique_source_image_thumbnail_label",
             ),
         ]
-        indexes = [
-            models.Index(fields=["source_image", "label"]),
-        ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/models.py` around lines 2767 - 2776, The Meta definition for the
thumbnail model has a redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py`:
- Around line 34-49: The timeout handling around AddIndexConcurrently is
one-sided; make the guard symmetric by changing the RunSQL in
0093_index_detection_occurrence_timestamp so the reverse path also disables the
statement timeout before the concurrent index drop. Update the migrations.RunSQL
block that currently uses RESET statement_timeout as reverse_sql so the rollback
direction matches the forward setup/teardown behavior and keeps the session
timeout state safe for DROP INDEX CONCURRENTLY.

In `@ami/main/models.py`:
- Around line 2443-2458: thumbnail_is_valid currently only compares the
thumbnail width, so spec changes that affect the rendered output can be missed;
update the validity check to account for the full thumbnail spec, not just
width. Use thumbnail_is_valid in ami/main/models.py as the entry point, and
compare all output-affecting fields from spec (including height and any other
generator inputs) before returning true. Keep the existing
path/last_modified/source change checks, but ensure a spec mismatch always
forces regeneration.

---

Nitpick comments:
In `@ami/main/models.py`:
- Around line 2767-2776: The Meta definition for the thumbnail model has a
redundant index on the same fields covered by
unique_source_image_thumbnail_label; remove the extra
models.Index(fields=["source_image", "label"]) from the class Meta indexes while
keeping the UniqueConstraint in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca52d1f6-4576-456a-b80e-faa352c93437

📥 Commits

Reviewing files that changed from the base of the PR and between a185a4a and c7fd9b3.

📒 Files selected for processing (2)
  • ami/main/migrations/0093_index_detection_occurrence_timestamp.py
  • ami/main/models.py
🛑 Comments failed to post (2)
ami/main/migrations/0093_index_detection_occurrence_timestamp.py (1)

34-49: 🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY'
import sys
try:
    import django
    print(f"Django version: {django.VERSION}")
    
    # Try to locate the source of AddIndexConcurrently
    # Note: In standard Django, there isn't a public `AddIndexConcurrently` in `migrations.operations.fields`
    # It might be a custom class or from a specific Django version that needs checking.
    # Let's check if it exists and where it imports from in the specific file.
    pass
except ImportError:
    print("Django not available in this environment.")

# Since we cannot rely on installed Django to be the exact version of the repo,
# and we need to know the *actual* behavior in this codebase, 
# let's try to read the file to see if it's a custom class defined elsewhere or imported.
# However, usually `AddIndexConcurrently` is not standard until very recent versions or custom.
# Let's search for the definition.
PY

Repository: RolnickLab/antenna

Length of output: 198


🏁 Script executed:

rg -A 20 "class AddIndexConcurrently" --type py

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

grep -r "AddIndexConcurrently" --include="*.py" . | head -20

Repository: RolnickLab/antenna

Length of output: 830


Make the timeout guard symmetric on rollback.

The current migration disables the statement timeout only for the forward direction. If the migration fails and Django rolls back, the DROP INDEX CONCURRENTLY will execute without the timeout reset, risking a failure if the index is large or the session has a strict timeout. Swap the reverse_sql arguments to ensure the timeout is disabled before the index drop during rollback.

         migrations.RunSQL(
             sql="SET statement_timeout = 0;",
-            reverse_sql=migrations.RunSQL.noop,
+            reverse_sql="RESET statement_timeout;",
         ),
         AddIndexConcurrently(
             model_name="detection",
             index=models.Index(
                 fields=["occurrence", "-timestamp"],
                 name="detection_occurrence_ts_desc",
             ),
         ),
         migrations.RunSQL(
             sql="RESET statement_timeout;",
-            reverse_sql=migrations.RunSQL.noop,
+            reverse_sql="SET statement_timeout = 0;",
         ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        migrations.RunSQL(
            sql="SET statement_timeout = 0;",
            reverse_sql="RESET statement_timeout;",
        ),
        AddIndexConcurrently(
            model_name="detection",
            index=models.Index(
                fields=["occurrence", "-timestamp"],
                name="detection_occurrence_ts_desc",
            ),
        ),
        # Restore the session default so the disabled timeout does not leak into
        # later migrations executed on the same (non-atomic) connection.
        migrations.RunSQL(
            sql="RESET statement_timeout;",
            reverse_sql="SET statement_timeout = 0;",
        ),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/migrations/0093_index_detection_occurrence_timestamp.py` around
lines 34 - 49, The timeout handling around AddIndexConcurrently is one-sided;
make the guard symmetric by changing the RunSQL in
0093_index_detection_occurrence_timestamp so the reverse path also disables the
statement timeout before the concurrent index drop. Update the migrations.RunSQL
block that currently uses RESET statement_timeout as reverse_sql so the rollback
direction matches the forward setup/teardown behavior and keeps the session
timeout state safe for DROP INDEX CONCURRENTLY.
ami/main/models.py (1)

2443-2458: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Invalidate thumbnails when the spec changes.

thumbnail_is_valid() only checks width, so a thumbnail spec update that changes the generated height (or any other output-affecting setting) will still be treated as warm and never regenerated.

♻️ Suggested fix
-        if thumb is None or not thumb.path or thumb.width != spec["width"]:
+        expected_height = spec.get("height")
+        if thumb is None or not thumb.path or thumb.width != spec["width"]:
             return False
+        if expected_height is not None and thumb.height != expected_height:
+            return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def thumbnail_is_valid(self, spec: dict, thumb: "SourceImageThumbnail | None") -> bool:
        """Whether ``thumb`` satisfies ``spec`` and need not be regenerated.

        ``thumb.width`` stores the requested spec width (see the generator), so the
        comparison is strict equality; legacy encoder-width rows read invalid and
        self-heal on next generation. A None ``last_modified`` on either side means
        "no signal of change" (matches ``NULL < x`` → ``False`` in SQL).
        """
        expected_height = spec.get("height")
        if thumb is None or not thumb.path or thumb.width != spec["width"]:
            return False
        if expected_height is not None and thumb.height != expected_height:
            return False
        source_changed = (
            self.last_modified is not None
            and thumb.last_modified is not None
            and thumb.last_modified < self.last_modified
        )
        return not source_changed
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/models.py` around lines 2443 - 2458, thumbnail_is_valid currently
only compares the thumbnail width, so spec changes that affect the rendered
output can be missed; update the validity check to account for the full
thumbnail spec, not just width. Use thumbnail_is_valid in ami/main/models.py as
the entry point, and compare all output-affecting fields from spec (including
height and any other generator inputs) before returning true. Keep the existing
path/last_modified/source change checks, but ensure a spec mismatch always
forces regeneration.

@mihow

mihow commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Recommending we close this one, based on a measurement pass on a production-snapshot dataset (part of the #928 slow-query work). Summary of what we found:

The index doesn't help the query it's aimed at, at our data distribution. On the snapshot, detections-per-occurrence is averaging ~1.0 (p95 = 1, max 37), so there is essentially nothing to sort per occurrence. On the expensive occurrence aggregate the planner ignored the index (626 ms → 643 ms, within noise). Where it does help — first/last-appearance subqueries — the win is ~13 ms → ~2 ms over a couple hundred occurrences, i.e. single-digit milliseconds in absolute terms.

It adds write cost on the hottest insert path. main_detection is where ML ingestion writes thousands of rows per job; a sixth index there is btree maintenance on every one of those inserts, for a read win the data distribution makes trivial.

Heads-up on the description: the PR/commit body cites a /captures/ 1470 ms → 251 ms improvement, but we couldn't map that to an actual query in the code path — it looks like it may have come from a different change. Worth disregarding so it doesn't muddy the epic's evidence base.

Net: at current scale this is redundant. If we later get projects where occurrences genuinely accumulate many detections each, the first/last-appearance case could justify it — but the data isn't there today.

For context: the occurrence-list slowness that is real turned out to be a missing (project, -determination_score) index for the default list sort, which is now up as #1351 (proven ~150× on the page query and stable under concurrency). The branch here has been rebased onto current main, so it's trivially re-openable if we want it later.

@mihow mihow closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants