I am considering which process launcher, between mpirun
and srun
, is better at optimizing the resources. Let's say one compute node in a cluster has 16 cores in total and I have a job I want to run using 10 processes.
If I launch it using
mpirun -n10
, will it be able to detect that my request has less number of cores than what's available in each node and will automatically assign all 10 cores from a single node? Unlikesrun
that has-N <number>
to specify the number of nodes,mpirun
doesn't seem to have such a flag. I am thinking that running all processes in one node can reduce communication time.In the example above let's further assume that each node has 2 CPUs and the cores are distributed equally, so 8 cores/CPU and the specification say that there is 48 GB memory per node (or 24 GB/CPU or 3 GB/core). And suppose that each spawned process in my job requires 2.5 GB, so all processes will use up 25 GB. When does one say that a program exceeds the memory limit, is it when the total required memory:
- exceeds per node memory (hence my program is good, 25 GB < 48 GB), or
- exceeds per CPU memory (hence my program is bad, 25 GB > 24 GB), or
when the memory per process exceeds per core memory (hence my program is good, 2.5 GB < 3 GB)?