0

Is there a way to request full machines? At my department I have the problem that when running a large job, some processes get allocated to shared machines. I am not sure why but it happens that processes on those shared machines are extremely slowed down, possibly because of what the other user is doing.

I want to avoid this and so ideally I would be able to request not to share nodes when invoking qsub, is this possible?

We are using SGE, and different nodes have different number of cores so I can't just use ppn=4.

tlamadon
  • 970
  • 9
  • 18
  • On any sanely-managed cluster Grid Engine is configured to provide MPI jobs exclusive access to the nodes on which they execute. If this is not true on your cluster it is, by definition, insanely-managed, and your problem is off-topic here. The solution is to implement sane cluster management, which is not soluble by programming (alone). – High Performance Mark Dec 10 '13 at 12:05
  • 1
    Are you sure that you have SGE and not PBS / Torque? Both batch systems name the job submission executable `qsub` but `ppn=...` is very PBS/Torque since things are done differently in SGE, e.g. `-pe mpi 32` and the configuration of the `mpi` parallel environment controls how slots are allocated across the available nodes. Note that SGE does not support exclusive allocation and one (or his/her admins) has to be very creative. – Hristo Iliev Dec 10 '13 at 12:18
  • actually I am using OGS/GE. It seems that -l exclusive=true could be implemented but is not at my department. hristo-iliev, thank you for your comment, this means that I won't be able to find a solution around this in the sort run. Would there be a way to specify a list of nodes? – tlamadon Dec 10 '13 at 12:39
  • 1
    From a user's perspective, the only way to select a specific list of nodes is to specify `-l hostname="regexpr"`, where `regexpr` is a regular boolean expression that matches the nodes in the list, e.g. `-l hostname="node001|node002|node012"`. – Hristo Iliev Dec 10 '13 at 13:07

0 Answers0