0

I'm running some jobs using mpiexec (mpich2).
mpiexec process exits with nonzero status leaving some worker processes

I can print a list of running child jobs:

$ps aux | grep mpi

Is there another way to list running/hanging jobs?

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
Georgy Ivanov
  • 91
  • 1
  • 1
  • 2

2 Answers2

0

If MPI leaves around a zombie process (which is odd, this really shouldn't be happening), it will be named whatever the executable that you originally executed was called. So if you started your program with:

mpiexec -n 4 ./a.out

then you'll need to search for

ps aux | grep a.out

which will give you the list of all of those processes that are still hanging around. The reason that what you suggested won't usually work is that if the mpirun or mpiexec process has gone away (due to a crash or completion), you obviously can't keep searching for it. However, it's possible that the children will still be around for one reason or another.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
-1

this may help you : ps aux | grep MPICH

  • Nothing started by MPICH actually says MPICH as the name of the process. They executable is called `mpiexec` or `mpirun`. – Wesley Bland Jul 16 '13 at 15:50