Skip to content

fix:add OpenMP parallelization to BPCG CPU kernel band loops (assisted with Deepseek V4)#7400

Open
Missing-Hex wants to merge 4 commits into
deepmodeling:developfrom
Missing-Hex:fix/bpcg
Open

fix:add OpenMP parallelization to BPCG CPU kernel band loops (assisted with Deepseek V4)#7400
Missing-Hex wants to merge 4 commits into
deepmodeling:developfrom
Missing-Hex:fix/bpcg

Conversation

@Missing-Hex
Copy link
Copy Markdown

Linked Issue

Add OpenMP parallelization to BPCG CPU kernels for improved performance

Unit Tests and/or Case Tests for my changes

  • Existing unit tests in source/source_hsolver/kernels/test/ cover the BPCG functionality
  • Verified numerical correctness by running existing test cases with both single-threaded and multi-threaded configurations
  • Performance benchmarking shows significant speedup with multi-threaded execution

What's changed?

  • Added OpenMP parallelization directives (#pragma omp parallel for) to the outer band loops in line_minimize_with_block_op<CPU> and calc_grad_with_block_op<CPU> in bpcg_kernel_op.cpp
  • Added #pragma omp critical sections to protect MPI reduction calls (Parallel_Reduce::reduce_pool) ensuring thread safety
  • All OpenMP directives are wrapped with #ifdef _OPENMP for conditional compilation, maintaining backward compatibility with non-OpenMP builds
  • The parallelization targets band-level data parallelism where each band's computation is independent
  • Expected performance improvement scales with the number of bands and available CPU cores

Any changes of core modules? (ignore if not applicable)

  • Modified HSolver module: Enhanced bpcg_kernel_op.cpp with OpenMP parallelization for CPU kernels
  • The changes affect line_minimize_with_block_op and calc_grad_with_block_op structures which are critical for BPCG diagonalization performance
  • No changes to class interfaces or virtual functions - modifications are purely implementation-level optimizations
  • GPU code paths remain unchanged; this optimization only affects CPU execution

@Missing-Hex Missing-Hex changed the title fix:add OpenMP parallelization to BPCG CPU kernel band loops fix:add OpenMP parallelization to BPCG CPU kernel band loops (assisted with Deepseek V4) May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant