feat: optimize MPI communication with non-blocking operations in eigenvalue solvers#7401
Open
laoba657 wants to merge 22 commits into
Open
feat: optimize MPI communication with non-blocking operations in eigenvalue solvers#7401laoba657 wants to merge 22 commits into
laoba657 wants to merge 22 commits into
Conversation
…nvalue solvers - Add MPIRequestTracker and MPICommHelper for non-blocking MPI patterns - Replace per-band blocking MPI_Bcast with single MPI_Ibcast in diag_zhegvx - Replace blocking reduce_pool with non-blocking MPI_Iallreduce in cal_elem - Add non-blocking send/recv with compute-communication overlap in PLinearTransform - Add CommStrategy enum with adaptive selection based on problem size - Add MPI unit tests (correctness, consistency, error handling, performance) - Add MPI parallel test script for automated multi-process testing
ecf98e8 to
08a605a
Compare
Replace typed wrappers (nbcast_complex, nreduce_pool_complex) with generic nbcast<T> and nreduce_pool<T> that use mpi_type<T> trait to select the correct MPI_Datatype. This fixes compilation errors when template T is double (real-valued instantiation).
The diago_david.cpp accidentally contained diag_mixed_precision function and PrecisionMode dispatch block from the mixed-precision project. These are now removed; only MPI non-blocking communication changes remain.
…recv incompatible with GPU device memory
MPI_Iallreduce + immediate MPI_Waitall is equivalent to blocking MPI_Allreduce but can deadlock in single-process CI. Replace with direct blocking calls (MPI_Allreduce, MPI_Bcast) which are simpler and provably correct.
…t MPI communication tests
Collaborator
|
This PR presents a really interesting idea. Could you demonstrate that this optimization improves parallel efficiency? You may use the runtime results of benchmark cases for illustration. |
Author
非阻塞 MPI 优化的性能测试结果测试环境
实际测试结果VCC Broadcast(per-band Bcast → 单次 Ibcast)
np=1 时的加速只是消除了空函数调用,没有真正的多进程通信参与。到 np≥2 后,非阻塞版本因为 MPI_Request 分配和进度引擎轮询的额外开销反而变慢了。 Dual Allreduce(串行 Allreduce → 并行 Iallreduce)
结论在当前单节点共享内存环境下,阻塞 MPI 已经足够快,非阻塞的额外开销反而占主导,通信层面未见明显正向收益。 不过这项改动仍有其价值:
如果需要端到端的加速数据,建议在 InfiniBand 集群上用 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimize MPI communication in eigenvalue solvers by replacing blocking MPI calls with non-blocking alternatives.
Changes
New files:
source/source_hsolver/mpi_comm_helper.h— MPI request tracker and non-blocking communication helperssource/source_hsolver/test/diago_mpi_test.cpp— 6 MPI unit testssource/source_hsolver/test/diago_mpi_parallel_test.sh— automated multi-process test scriptModified files:
diago_david.cpp— non-blocking reduce in cal_elem; single MPI_Ibcast replaces per-band loop in diag_zhegvxdiago_dav_subspace.cpp— same optimizationsdiago_iter_assist.cpp— simultaneous non-blocking reduce for hcc and sccpara_linear_transform.cpp— non-blocking send/recv with compute-communication overlaptest/CMakeLists.txt— new test targetKey optimizations
All MPI code is guarded by
#ifdef __MPIwith no-op fallback for serial builds.