I'm working on a 18 node cluster, running TORQUE/PBS Pro/Open MPI.
Setup - 2 CPUs/node, 12 cores/CPU (so 24 allowable process per node).
If I submit PBS jobs that need an uneven split across the nodes, e.g. a job that requires say 58 process, I can split it via:
#PBS -l nodes=2:ppn=24+1:ppn=10
which assigns 2 nodes using all 24 cores, and 1 node using 10 cores. So I should now have 58 tasks running.
However, when I execute qstat -a
, the output says I only have 48 tasks running. It never seems to count the unevenly split node/s.
So, are those extra 10 processes actually running? What's going on? Is the output from qtsat
just incorrect?
I've scrounged through all the PBS readmes/mans I could find, no luck.