1

I have the following block of code that runs MPI_Bcat. Strangely the process sometimes hangs sometimes when I run it but not others. The debug info shows all processes reached line 129 (all process 0 to n-1 printed bcast start). But they never reached line 132.

128         if (n_procs > 1) {
129             debug("[%d] bcast start\n", dgrid->rank);
130             //  broadcast to other workers to stop their process
131             MPI_Bcast(finished, 1, MPI_INT, root, MPI_COMM_WORLD);
132             debug("[%d] bcast end\n", dgrid->rank);
133         }

What could be a possible cause of this problem? I have tried to look for a solution but all seems different. Could this be a system-level problem or is just my code?

The root process (0) is often the last process to reach line 129 judging from the terminal output.

Thanks in advance.

Dogemore
  • 590
  • 1
  • 5
  • 18
  • if you are using Open MPI or its derivative, the root rank might be much faster than the other ranks and hence flooding them. If adding `MPI_Barrier(MPI_COMM_WORLD)` before `MPI_Bcast()` gets rid of the hang, then you should consider using the `coll/sync` module (it will automatically do that for you) – Gilles Gouaillardet Mar 27 '20 at 05:36
  • I am very new to MPI. I was using mpich when I posted this question. Now I have changed it to Open-MPI (both from brew). Can you elaborate a bit on the flooding? I should add that the root is the last process to reach the line 129. At least judging from the printed message in stdout. And what is coll/sync? – Dogemore Mar 27 '20 at 05:59
  • @GillesGouaillardet I have also observed `abort trap 6` and `seg fault 11` when using `MPI_Barrier` and `MPI_Finalize`. Not sure if my distro is broken. (My debug log verified all processes reached the function). – Dogemore Mar 27 '20 at 06:11
  • 2
    flooding can occur when the MPI library makes no control flow, and the root process calls `MPI_Bcast()` many times in a row, generating a lot of unexpected messages on the other ranks and hence causing all kind of problems (memory consumption, slowdown, ...). Anyway, the `SIGSEGV` issue looks unrelated to this and you should debug it the "classical" way (get a core dump, post mortem analysis, ...) also make sure your program has no memory leaks. – Gilles Gouaillardet Mar 27 '20 at 06:44

1 Answers1

0

I've had the same issue on a large Fortran codebase. Fixing this has been a headache until I've found this stackoverflow question, so I'll be documenting that here in case someone else finds it useful.

The issue would be having a SEGFAULT error with this message:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
    at /usr/src/debug/glibc-2.17c758a686/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0

on the same line where MPI_BCAST is called.

The call was part of a large loop of broadcasts each with different root ID. To simplify, imagine something like:

do i=1,10000
   call MPI_BCAST(value(i),1,MPI_REAL8,owner(i),MPI_COMM_WORLD,ierr)
end do

The system was:

  • CentOS 7
  • OpenMPI 4.0.3
  • gcc/gfortran 9.2.0

The only way we could solve this issue was to put one MPI barrier after each call, like:

do i=1,10000
   call MPI_BCAST(value(i),1,MPI_REAL8,owner(i),MPI_COMM_WORLD,ierr)
   call MPI_barrier(MPI_COMM_WORLD,ierror)
end do

Thanks you all for the great advice!

Federico Perini
  • 1,414
  • 8
  • 13