MPI_Bcast hanging sometimes

Question

I have the following block of code that runs MPI_Bcat. Strangely the process sometimes hangs sometimes when I run it but not others. The debug info shows all processes reached line 129 (all process 0 to n-1 printed bcast start). But they never reached line 132.

128         if (n_procs > 1) {
129             debug("[%d] bcast start\n", dgrid->rank);
130             //  broadcast to other workers to stop their process
131             MPI_Bcast(finished, 1, MPI_INT, root, MPI_COMM_WORLD);
132             debug("[%d] bcast end\n", dgrid->rank);
133         }

What could be a possible cause of this problem? I have tried to look for a solution but all seems different. Could this be a system-level problem or is just my code?

The root process (0) is often the last process to reach line 129 judging from the terminal output.

Thanks in advance.

if you are using Open MPI or its derivative, the root rank might be much faster than the other ranks and hence flooding them. If adding `MPI_Barrier(MPI_COMM_WORLD)` before `MPI_Bcast()` gets rid of the hang, then you should consider using the `coll/sync` module (it will automatically do that for you) — Gilles Gouaillardet, Mar 27 '20 at 05:36
I am very new to MPI. I was using mpich when I posted this question. Now I have changed it to Open-MPI (both from brew). Can you elaborate a bit on the flooding? I should add that the root is the last process to reach the line 129. At least judging from the printed message in stdout. And what is coll/sync? — Dogemore, Mar 27 '20 at 05:59
@GillesGouaillardet I have also observed `abort trap 6` and `seg fault 11` when using `MPI_Barrier` and `MPI_Finalize`. Not sure if my distro is broken. (My debug log verified all processes reached the function). — Dogemore, Mar 27 '20 at 06:11
flooding can occur when the MPI library makes no control flow, and the root process calls `MPI_Bcast()` many times in a row, generating a lot of unexpected messages on the other ranks and hence causing all kind of problems (memory consumption, slowdown, ...). Anyway, the `SIGSEGV` issue looks unrelated to this and you should debug it the "classical" way (get a core dump, post mortem analysis, ...) also make sure your program has no memory leaks. — Gilles Gouaillardet, Mar 27 '20 at 06:44

score 0 · Answer 1 · answered Mar 17 '22 at 13:17

I've had the same issue on a large Fortran codebase. Fixing this has been a headache until I've found this stackoverflow question, so I'll be documenting that here in case someone else finds it useful.

The issue would be having a SEGFAULT error with this message:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
    at /usr/src/debug/glibc-2.17c758a686/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0

on the same line where MPI_BCAST is called.

The call was part of a large loop of broadcasts each with different root ID. To simplify, imagine something like:

do i=1,10000
   call MPI_BCAST(value(i),1,MPI_REAL8,owner(i),MPI_COMM_WORLD,ierr)
end do

The system was:

CentOS 7
OpenMPI 4.0.3
gcc/gfortran 9.2.0

The only way we could solve this issue was to put one MPI barrier after each call, like:

do i=1,10000
   call MPI_BCAST(value(i),1,MPI_REAL8,owner(i),MPI_COMM_WORLD,ierr)
   call MPI_barrier(MPI_COMM_WORLD,ierror)
end do

Thanks you all for the great advice!

MPI_Bcast hanging sometimes

1 Answers1