I've installed openmpi , not in /usr/...
but in a /commun/data/packages/openmpi/
, it was compiled with --with-sge
.
I've added a new PE in SGE as descibed in http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html
# /commun/data/packages/openmpi/bin/ompi_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3)
# qconf -sq all.q | grep pe_
pe_list make orte
Without SGE, the program runs without any problem, using several processors.
/commun/data/packages/openmpi/bin/orterun -np 20 ./a.out args
Now I want to submit my program to SGE
In the Open MPI FAQ, I read:
# Allocate a SGE interactive job with 4 slots
# from a parallel environment (PE) named 'orte'
shell$ qsh -pe orte 4
but my output is:
qsh -pe orte 4
Your job 84550 ("INTERACTIVE") has been submitted
waiting for interactive job to be scheduled ...
Could not start interactive job.
I've also tried the mpirun
command embedded in a script:
$ cat ompi.sh
#!/bin/sh
/commun/data/packages/openmpi/bin/mpirun \
/path/to/a.out args
but it fails
$ cat ompi.sh.e84552
error: executing task of job 84552 failed: execution daemon on host "node02" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 18327) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
error: executing task of job 84552 failed: execution daemon on host "node01" didn't accept task
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
How can I fix this?
answer in the openmpi mailing list: http://www.open-mpi.org/community/lists/users/2013/02/21360.php