My program uses MPI+pthreads, where n-1 MPI processes are pure MPI code whereas the only one MPI process uses pthreads. The last process contains only 2 threads( main thread and pthread ). Suppose that the HPC cluster I want to run this program on consists of compute nodes, each of which has 12 cores. How should I write my batch script to maximise utilization of the hardware?
Following is my batch script I wrote. I use export OMP_NUM_THREADS=2 because the last MPI process has 2 threads and have to assume that the others have 2 threads each as well.
Then I allocate 6 MPI processes per node, so each node can run 6xOMP_NUM_THREADS = 12(=the number of cores on each node) threads despite the fact that all MPI processes but one have 1 thread.
#BSUB -J LOOP.N200.L1000_SIMPLE_THREAD
#BSUB -o LOOP.%J
#BSUB -W 00:10
#BSUB -M 1024
#BSUB -N
#BSUB -a openmpi
#BSUB -n 20
#BSUB -m xxx
#BSUB -R "span[ptile=6]"
#BSUB -x
export OMP_NUM_THREADS=2
How can I write a better script for this ?