By default most OpenMP implementations set the maximum number of threads equal to the number of logical CPUs. This default doesn't work very well in combination with MPI. For example, if you launch 2 hybrid MPI+OpenMP processes on the same node, each one of them would try to use all available CPUs and this will result in oversubscription. That's why on our cluster we set OMP_NUM_THREADS
to 1 for all MPI jobs and if the user is launching hybrid ones, (s)he must explicitly set OMP_NUM_THREADS
to the desired number of threads per MPI process.
The proper way to start hybrid jobs with Open MPI depends on your cluster environment. You should set OMP_NUM_THREADS
to some sane value and then either have the resource manager propagate the value to all MPI processes or do that explicitly.
If Open MPI is compiled with tight integration with the cluster resource manager (SGE, LSF, PBS, Torque, etc.), the following (as part of the job script) would suffice:
export OMP_NUM_THREADS=4
mpiexec -n 4 ./program <program arguments>
This will launch 4 MPI processes with 4 OpenMP threads each.
Otherwise the -x
option has to be used in order to pass the value of OMP_NUM_THREADS
to the MPI processes:
$ export OMP_NUM_THREADS=4
$ mpiexec -x OMP_NUM_THREADS -H host1,host2,host3,host4 -n 4 ./program
Open MPI 1.8.x introduced process binding by default. You can verify it by adding the --report-bindings
option to mpiexec
and observe the printed binding information. By default MPI processes are bound to a single (and different) CPU core each. That's why the OpenMP runtime only uses a single thread by default on machines with single hardware thread per core - launching more than one thread on a single core won't benefit most programs.
To restore the behaviour back to the one of Open MPI 1.6.x, add the --bind-to none
command line option which disables the binding:
$ mpiexec --report-bindings -np 2 ./test
[cluster:28358] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.]
[cluster:28358] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.]
Hello from 0, omp_get_max_threads returned me 1
Hello from 1, omp_get_max_threads returned me 1
$ mpiexec --bind-to none --report-bindings -np 2 ./test
[cluster:28743] MCW rank 1 is not bound (or bound to all available processors)
[cluster:28743] MCW rank 0 is not bound (or bound to all available processors)
Hello from 0, omp_get_max_threads returned me 64
Hello from 1, omp_get_max_threads returned me 64
When you specify the -H
option, each host listed there provides a single slot. When you ask Open MPI to launch more processes than the slots specified (i.e. you oversubscribe the hosts), the library switches off the default binding. That's why the combination -H localhost -np 1
results in two threads being used, since you provide one slot and ask for one process, therefore that process is bound to a single core (in your case the cores appear to have two hardware threads). The combination -H localhost -np 2
asks for two processes on a single slot, i.e. oversubscription, therefore the library disables the binding mechanism and the two MPI processes have access to all 8 hardware threads. The combination -H server,server -np 2
provides two slots for two processes, i.e. no oversubscription, therefore the binding is active.
$ mpiexec -H localhost --report-bindings -np 1 ./test
[cluster:21425] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.]
Hello from 0, omp_get_max_threads returned me 1
$ mpiexec -H localhost --report-bindings -np 2 ./test
[cluster:38895] MCW rank 1 is not bound (or bound to all available processors)
[cluster:38895] MCW rank 0 is not bound (or bound to all available processors)
Hello from 1, omp_get_max_threads returned me 64
Hello from 0, omp_get_max_threads returned me 64
$ mpiexec -H localhost,localhost --report-bindings -np 2 ./test
[cluster:39329] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.]
[cluster:39329] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.][./././././././.]
Hello from 1, omp_get_max_threads returned me 1
Hello from 0, omp_get_max_threads returned me 1