I'm running some jobs using mpiexec
(mpich2).
mpiexec
process exits with nonzero status leaving some worker processes
I can print a list of running child jobs:
$ps aux | grep mpi
Is there another way to list running/hanging jobs?
I'm running some jobs using mpiexec
(mpich2).
mpiexec
process exits with nonzero status leaving some worker processes
I can print a list of running child jobs:
$ps aux | grep mpi
Is there another way to list running/hanging jobs?
If MPI leaves around a zombie process (which is odd, this really shouldn't be happening), it will be named whatever the executable that you originally executed was called. So if you started your program with:
mpiexec -n 4 ./a.out
then you'll need to search for
ps aux | grep a.out
which will give you the list of all of those processes that are still hanging around. The reason that what you suggested won't usually work is that if the mpirun
or mpiexec
process has gone away (due to a crash or completion), you obviously can't keep searching for it. However, it's possible that the children will still be around for one reason or another.
this may help you : ps aux | grep MPICH