We have an IBM HPC 4.2 with 32 compute nodes. We did compile and install Openmpi 1.10.1 with lsf support.
The problem : We have a different behavior between IBM MPI (MPI chipped with the platform or PMPI) and Openmpi when we use them under lsf.
Example : I did compile hello_world.c mpi example with both implementations, when I launch the execution without lsb (without bsub) I get
PMPI : mpirun -np 4 -hostlist "compute000 compute001" ./hello_world_pmpi.exe
Hello world! I'm 1 of 4 on compute000
Hello world! I'm 2 of 4 on compute001
Hello world! I'm 3 of 4 on compute001
Hello world! I'm 0 of 4 on compute000
Openmpi : mpirun -np 4 --host "compute000,compute001" --mca btl self,sm --mca mtl psm ./hello_world_ompi.exe
Hello world! I'm 1 of 4 on compute000
Hello world! I'm 2 of 4 on compute000
Hello world! I'm 3 of 4 on compute001
Hello world! I'm 0 of 4 on compute001
Which is logic, but when I use lsf, things change with PMPI which is more weird !!!! and I get
PMPI : bsub -n 4 -R "span[ptile=2]" -o pmpi-%J.out mpirun ./hello_world_pmpi.exe
cat pmpi-xxx.out ...
Hello world! I'm 0 of 1 on compute017
Opemmpi : bsub -n 4 -R "span[ptile=2]" -o ompi-%J.out mpirun --mca btl self,sm --mca mtl psm ./hello_world_ompi.exe
cat pmpi-xxx.out ...
**Hello world! I'm 1 of 4 on compute005
Hello world! I'm 2 of 4 on compute010
Hello world! I'm 3 of 4 on compute010
Hello world! I'm 0 of 4 on compute005**
It seems like just one instance of PMPI is launched and not 4.
I have the same problem with IMB (intel benchmark) and HPCC, they complain about lack of processes, without lsf they both work fine, with lsf, only openmpi works correctly.
any Idea?
Thanks in advance