MPI_Finalize not behaving correctly, orphaned processes

Question

I have an MPI program that is fairly straight forward, essentially "initialize, 2 sends from master to slaves, 2 receives on slaves, do a bunch of system calls for copying/pasting then running a code, tidy up and mpi finalize".

This seems straightforward, but I'm not getting mpi_finalize to work correctly. Below is a snapshot of the program, without all the system copy/paste/call external code which I've rolled up in "do codish stuff" type statements.

program mpi_finalize_break
!<variable declarations>
call MPI_INIT(ierr)
icomm = MPI_COMM_WORLD
call MPI_COMM_SIZE(icomm,nproc,ierr)
call MPI_COMM_RANK(icomm,rank,ierr)

!<do codish stuff for a while>
if (rank == 0) then
    !<set up some stuff then call MPI_SEND in a loop over number of slaves>
    call MPI_SEND(numat,1,MPI_INTEGER,n,0,icomm,ierr)
    call MPI_SEND(n_to_add,1,MPI_INTEGER,n,0,icomm,ierr)
else
    call MPI_Recv(begin_mat,1,MPI_INTEGER,0,0,icomm,status,ierr)
    call MPI_Recv(nrepeat,1,MPI_INTEGER,0,0,icomm,status,ierr)
    !<do codish stuff for a while>
endif

print*, "got here4", rank
call MPI_BARRIER(icomm,ierr)
print*, "got here5", rank, ierr
call MPI_FINALIZE(ierr)

print*, "got here6"
end program mpi_finalize_break

Now the problem I am seeing occurs around the "got here4", "got here5" and "got here6" statements. I get the appropriate number of print statements with corresponding ranks for "got here4", as well as "got here5". Meaning, the master and all the slaves (rank 0, and all other ranks) got to the barrier call, through the barrier call and to MPI_FINALIZE, reporting 0 for ierr on all of them. However, when it gets to "got here6", after the MPI_FINALIZE I'll get all kinds of weird behavior. Sometimes I'll get one less "got here6" than I expect, or sometimes I'll get 6 less, however the program hangs forever never closing and leaves an orphaned process on one (or more) of the compute nodes.

I am running this on an infiniband backbone machine, with the NFS server shared over infiniband (nfs-rdma). I'm trying to determine how the MPI_BARRIER call works fine, yet MPI_FINALIZE ends up with random orphaned runs (not the same node, nor the same number of orphans every time). I'm guessing it is related to the various system calls to cp, mv, ./run_some_code, cp, mv but wasn't sure if it may be related to the speed of infiniband too, as all this happens fairly quickly. I could have wrong intuition as well. Anybody have thoughts? I could put the whole code if helpful, but this condensed version I believe captures it. I'm running openmpi1.8.4 compiled against ifort 15.0.2 , with Mellanox adapters running firmware 2.9.1000.

Thanks for the help.

Update:

Per the request, I put the "MPI_Abort" in and get the following:

forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
pburn              0000000000438CB1  Unknown               Unknown  Unknown
pburn              0000000000437407  Unknown               Unknown  Unknown
libmpi_usempif08.  00002B5BCB5C5712  Unknown               Unknown  Unknown
libmpi_usempif08.  00002B5BCB5C5566  Unknown               Unknown  Unknown
libmpi_usempif08.  00002B5BCB5B3DCC  Unknown               Unknown  Unknown
libmpi_usempif08.  00002B5BCB594F63  Unknown               Unknown  Unknown
libpthread.so.0    000000345C00F710  Unknown               Unknown  Unknown
libc.so.6          000000345B8DB2ED  Unknown               Unknown  Unknown
libc.so.6          000000345B872AEF  Unknown               Unknown  Unknown
libc.so.6          000000345B866F26  Unknown               Unknown  Unknown
libopen-pal.so.6   00002B5BCC313EB2  Unknown               Unknown  Unknown
libopen-rte.so.7   00002B5BCC0416FE  Unknown               Unknown  Unknown
libmpi.so.1        00002B5BCBD539DF  Unknown               Unknown  Unknown
libmpi_mpifh.so.2  00002B5BCBADCF5A  Unknown               Unknown  Unknown
pburn              0000000000416889  MAIN__                    415  parallel_burn.f90
pburn              00000000004043DE  Unknown               Unknown  Unknown
libc.so.6          000000345B81ED5D  Unknown               Unknown  Unknown
pburn              00000000004042E9  Unknown               Unknown  Unknown

But the code runs correctly otherwise (all the correct output files and things).

I didn't as I have first assumed I'm the problem, but the more I investigate the less it makes sense, so I'll cross post there too. — jackd, Apr 22 '15 at 22:11
So technically, MPI gives you no guarantee about how many processes exist after `MPI_Finalize` but every implementation for clusters will have the same number after as before. I'm not sure what the IO flush semantics are in Fortran (even though I use Fortran lot, it's in a code that has it's own flush routines that wrap C), so I'd not use `got here6` as a diagnostic. However, if you find that not all processes exist after `MPI_Finalize`, that is clearly a problem. What happens if you replace `MPI_Finalize()` with `MPI_Abort(MPI_COMM_WORLD,0)`? — Jeff Hammond, Apr 22 '15 at 22:16

score 0 · Answer 1 · answered Apr 22 '15 at 22:53

(This is more a comment than an answer, but I needed space to put the error message....)

Your problem may also come from your "copy/paste/call external code" if there is a call system somewhere. With OpenMPI, it is forbidden to fork a process. You get a warning for this:

--------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
-------------------------------------

MPI_Finalize not behaving correctly, orphaned processes

1 Answers1