fix(cc): handle nloc==0 in DeepSpinPTExpt with phantom-atom padding#5485
Conversation
Multi-rank spin MD can leave a rank with zero real local atoms when all atoms migrate to other subdomains. The with-comm AOTI artifact hits an intermittent SIGFPE (integer divide by zero) at runtime in inductor-generated shape arithmetic that uses nloc as a divisor. The graph was traced with nloc_min=1 and inductor lowered an even stricter nloc>=2 runtime-check which is silently bypassed because AOTI_RUNTIME_CHECK_INPUTS is unset by default. Whether the offending divide is actually emitted depends on inductor's code-gen choices, which vary across compiles -- hence the random nature of the failure (reproduced on CI run 26667802665). Fix: prepend two phantom atoms with empty neighbour lists ahead of the real atoms when nloc_real==0. The AOTI graph then runs with nloc==2, satisfying the inductor specialisation. Phantoms have no neighbours so they contribute zero atomic energy / force / virial, preserving the physically-correct 'this rank has no real atoms' result. comm_dict's nlocal is set to 2 so border_op writes received ghost features past the phantom slots; outputs are stripped of the phantom prefix before being scattered back to LAMMPS via select_map.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47f15b41a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR modifies DeepSpinPTExpt::compute to handle ranks with zero real local atoms by prepending phantom atoms for model inputs, adjusting mapping and neighbor-list tensors so phantoms are isolated, zeroing reduced energy when needed, and stripping phantom entries from force and atomic output tensors before returning to LAMMPS. ChangesPhantom-atom padding for zero-local-atoms corner case
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/api_cc/src/DeepSpinPTExpt.cc`:
- Around line 487-489: The mapping assignment for real atoms fails to account
for phantom padding: when phantom_n > 0 the indices returned by fwd_map(...) are
in the pre-padding space and must be shifted by phantom_n. Update the loop that
sets mapping[ii] (which uses fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]])
to add phantom_n to the fwd_map result so real-atom targets point into the
post-padding index range; keep all other indexing (ii, bkw_map,
lmp_list.mapping) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e4e67575-7512-48ab-a532-4f17dad67d93
📒 Files selected for processing (1)
source/api_cc/src/DeepSpinPTExpt.cc
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5485 +/- ##
==========================================
+ Coverage 81.36% 81.37% +0.01%
==========================================
Files 868 868
Lines 96437 96598 +161
Branches 4233 4241 +8
==========================================
+ Hits 78463 78611 +148
- Misses 16674 16683 +9
- Partials 1300 1304 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Codex flagged on PR deepmodeling#5485: phantoms have constant atomic-energy outputs that flow into 'energy_redu'. On the spin path the SpinModel doubles atoms internally, so both real and spin phantom halves contribute -- and 'output_map["energy"]' only exposes the real half after the '[:, :nloc]' slice. Subtracting only that real half (a first attempt) left the spin half leaking into the MPI-reduced LAMMPS total: CI run 26796476553 showed mpi-2 = -2.45 vs mpi-1 ref = -1.49. Simpler exact fix: a rank with no real local atoms contributes zero to the total energy by definition. The phantoms are pure scaffolding to satisfy inductor's nloc>=2 specialisation; their fitting output is a numerical artifact, not physics. Zero 'ener' directly when phantom_n > 0. Forces / force_mag / virial are unaffected because phantom outputs are coord-independent (no neighbours) so their derivatives are zero -- no analogous correction is needed there.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/api_cc/src/DeepSpinPTExpt.cc (1)
393-401:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPad
aparam_when you synthesize phantom locals.Lines 398-400 change the local-atom count from
0to2, butaparam_stays in the pre-padding state. On an empty-local rank withdaparam > 0, Lines 551-560 still send an empty tensor, so this path can still break for spin models that were exported with atomic parameters.🧩 Suggested fix
if (phantom_n > 0) { dcoord.insert(dcoord.begin(), static_cast<size_t>(phantom_n) * 3, static_cast<VALUETYPE>(0)); datype.insert(datype.begin(), static_cast<size_t>(phantom_n), 0); + if (daparam > 0) { + aparam_.insert(aparam_.begin(), + static_cast<size_t>(phantom_n) * daparam, + static_cast<VALUETYPE>(0)); + } nall_real += phantom_n; nloc_real = phantom_n; nloc = nall_real - nghost_real; }Also applies to: 550-560
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 393 - 401, When synthesizing phantom locals (phantom_n > 0) you must pad the per-atom parameter array aparam_ to match the new phantom atoms: insert static_cast<size_t>(phantom_n) * static_cast<size_t>(daparam) default-valued entries at the front of aparam_ (similar to how dcoord and datype are padded) so subsequent send logic (the path around the existing tensor sends) sees the correct size; perform this insert in the same block that updates dcoord, datype, nall_real, nloc_real, and nloc.
♻️ Duplicate comments (1)
source/api_cc/src/DeepSpinPTExpt.cc (1)
487-489:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winOffset real-atom mapping targets into the post-padding index space.
This is still using
fwd_map[...]from the pre-padding layout. After Lines 395-400 prepend two phantom rows, every real/ghost row has moved byphantom_n, so the current mapping can resolve to a phantom slot instead of the intended atom row.🐛 Minimal fix
for (int ii = phantom_n; ii < nall_real; ii++) { - mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]]; + mapping[ii] = + fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]] + phantom_n; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 487 - 489, The mapping loop uses fwd_map from the pre-padding layout so indices can point into phantom rows; update the resolved index by offsetting into the post-padding layout by phantom_n. Concretely, in the loop that sets mapping[ii] (which references fwd_map, lmp_list.mapping and bkw_map), add phantom_n to the index into fwd_map (e.g. use fwd_map[ lmp_list.mapping[bkw_map[ii - phantom_n]] + phantom_n ] or otherwise shift the resolved value by phantom_n) so every real/ghost row maps into the post-padding index space.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@source/api_cc/src/DeepSpinPTExpt.cc`:
- Around line 393-401: When synthesizing phantom locals (phantom_n > 0) you must
pad the per-atom parameter array aparam_ to match the new phantom atoms: insert
static_cast<size_t>(phantom_n) * static_cast<size_t>(daparam) default-valued
entries at the front of aparam_ (similar to how dcoord and datype are padded) so
subsequent send logic (the path around the existing tensor sends) sees the
correct size; perform this insert in the same block that updates dcoord, datype,
nall_real, nloc_real, and nloc.
---
Duplicate comments:
In `@source/api_cc/src/DeepSpinPTExpt.cc`:
- Around line 487-489: The mapping loop uses fwd_map from the pre-padding layout
so indices can point into phantom rows; update the resolved index by offsetting
into the post-padding layout by phantom_n. Concretely, in the loop that sets
mapping[ii] (which references fwd_map, lmp_list.mapping and bkw_map), add
phantom_n to the index into fwd_map (e.g. use fwd_map[
lmp_list.mapping[bkw_map[ii - phantom_n]] + phantom_n ] or otherwise shift the
resolved value by phantom_n) so every real/ghost row maps into the post-padding
index space.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1b0da563-36d4-460d-8300-e08ca6d9f052
📒 Files selected for processing (1)
source/api_cc/src/DeepSpinPTExpt.cc
|
Thanks for the update. CI is green now, but I think two small correctness issues still need to be addressed before approval:
mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]];When
Could you please add a small follow-up commit for these two cases? After that this should be good to approve. — OpenClaw 2026.5.28 (model: gpt-5.5) |
The nloc==0 phantom-atom prefix padded dcoord/datype/dspin/nlist/mapping but
not aparam_, so with dim_aparam > 0 the aparam tensor (shape {1, nloc, daparam})
was built against the padded nloc while aparam_ still held the pre-padding
layout -- a shape mismatch (or a silently-skipped aparam when nloc_real==0 left
aparam_ empty).
Prepend phantom_n * daparam zero rows to aparam_ when phantom_n > 0 and
daparam > 0, so the two phantom local atoms carry zero atomic parameters and the
aparam tensor stays aligned with the padded local atoms. aparam_nall is false on
this path, so aparam_ is a per-local-atom buffer.
…in fixture
Give the DPA3 spin fixture (deeppot_dpa3_spin{,_mpi}.pt2) numb_aparam=1 so the
empty-subdomain MPI test exercises the phantom-atom aparam padding added in
DeepSpinPTExpt (rank with nloc_real==0 must prepend zero aparam rows). The spin
LAMMPS runner now supplies a uniform `aparam`, and the C++ with-comm
load-failure test passes a uniform aparam to its DeepSpin compute calls (the
shared fixture now has dim_aparam=1 and there is no default_aparam).
The spin LAMMPS tests are self-consistent (mpi-N vs mpi-1), so no reference
values change. Reuses the existing fixture rather than adding a new model.
…ure test multi_rank_compute_throws simulated multi-rank via inlist.nswap=1, but the DeepSpinPTExpt dispatch keys on lmp_list.nprocs>1 (nswap is unsound for atom_style spin). With nprocs unset, multi_rank was false and the use_with_comm/with-comm-loader-failed throw never fired -- the compute returned without throwing (also note nghost==0 skips the message-passing fail-fast block). Set inlist.nprocs=2 so the test exercises the real multi-rank dispatch throw. Verified locally: both tests in the suite pass against the regenerated numb_aparam=1 DPA3 spin fixture.
In the nloc==0 phantom-padding path, the rebuilt mapping resolved real/ghost rows to fwd_map[...] — a pre-padding local index. Since the phantom prefix shifts every real/ghost row by phantom_n, the resolved target must be shifted by +phantom_n into the post-padding local index space (no-op when phantom_n == 0). This branch is reached by the empty-subdomain test (multi-rank + with-comm + atom_modify map yes populates lmp_list.mapping).
Clarify that the lmp_list.mapping branch with phantom_n>0 is structurally unreachable (set_mapping is single-rank only; phantom_n>0 is multi-rank only), so the +phantom_n shift is a no-op on every reachable path and is kept only to keep the mapping correct if that invariant ever changes. Corrects the prior commit's overstated reachability claim.
njzjz-bot
left a comment
There was a problem hiding this comment.
Re-reviewed the latest head (94600920). The previous aparam_ concern is addressed by padding phantom_n * daparam zero rows, and the added fixture now exercises the empty-subdomain path with numb_aparam=1. CI is green as well.
The mapping + phantom_n change is defensive on the current LAMMPS path (the branch is still structurally unreachable when phantom_n > 0), but it is harmless and keeps the invariant explicit if that changes later.
Looks good to me.
— OpenClaw 2026.5.28 (model: custom-chat-jinzhezeng-group/gpt-5.5)
Problem
Multi-rank spin MD can leave a rank with zero real local atoms (
nloc_real == 0) when atoms migrate to other subdomains. The with-comm AOTI artifact hits an intermittent SIGFPE (integer divide by zero) at runtime in inductor-generated shape arithmetic that usesnlocas a divisor.Reproduced on master CI run
26667802665:Root cause:
nloc_min=1(serialization.py:362) and inductor lowered an even stricternloc >= 2runtime-check (visible in the generatedwrapper.cpp'scheck_input_3).AOTI_RUNTIME_CHECK_INPUTS(default OFF), so withnloc = 0the check is silently bypassed and the compiled graph runs through its own divide-by-zero on shape arithmetic.Fix
Prepend two phantom atoms with empty neighbour lists when
nloc_real == 0so the AOTI graph runs withnloc == 2and never reaches the integer-divide-by-zero path. Phantoms have no neighbours so they contribute zero atomic energy / force / virial, preserving the physically-correct "this rank has no real atoms" result.Key details (all in
source/api_cc/src/DeepSpinPTExpt.cc):dcoord/datype/dspinget two zero-valued rows prepended.firstneigh_tensorgets two-1rows prepended (no neighbours).mapping_tensorgets two identity entries prepended.comm_dict.nlocalis set to2(not the LAMMPS-reported0) soborder_opwrites received ghost features past the phantom slots.dforce,dforce_mag,datom_energy,datom_virial) get the phantom prefix stripped before being scattered back to LAMMPS viaselect_map.Why phantoms rather than
Dim(min=0)re-exportBumping the trace constraint to
min=0would require:nloc-dependent divide indeepmd/dpmodel/{descriptor,fitting,model}/and protecting withxp.maximum(nloc, 1);torch.exportre-emitting compatible guards (currently fails because spin-side shape relationships requirenloc >= 1to be inferable);.pt2archive insource/tests/infer/.The phantom approach is a strict superset of correctness and self-contained in one C++ file. The two approaches aren't mutually exclusive — the
min=0route can land as a follow-up once the dpmodel audit is done.Test plan
runUnitTests_cc --gtest_filter='*Spin*': 42 / 42 spin C++ regression tests pass (12 TF-backend tests skipped, as expected in the PT-only venv).test_pair_deepmd_mpi_dpa3_spin_empty_subdomainshould now pass deterministically. Local Python LAMMPS-MPI verification is blocked by a pre-existing OpenMPI/MPICH ABI mismatch in my local venv (the plugin'sompi_mpi_*symbols can't resolve against MPICH'slibmpi.so.12), so end-to-end verification falls to CI.Known limitations
nloc_real > 0(theif (phantom_n > 0)branch never fires), so the common path is unchanged.nloclower-bound to >2,phantom_nwill need to track that minimum.DeepSpinPTExptonly. The corresponding non-spin path inDeepPotPTExpthas the same code shape; non-spin DPA3 empty-subdomain currently passes in CI but could regress similarly with a future inductor change. Deferred to a follow-up if observed.Summary by CodeRabbit