Is there a way to limit the number of cores allocated to a single job? Similar to how I can limit memory by specifying "-l h_vmem=3g" as a default parameter to qsub, via the sge_request config file.
More specifically, the scenario I want to avoid is:
- User creates or uses a program that uses multiple processors within a single job, e.g. by using the multiprocessing module in Python.
- User submits the job on non-parallel environment queue, which will have the job consumes a single slot, as seen by qstat, but multiple CPUs are utilized (using top on execution host).
- Under this scenario, the one or more users can bypass the total number of allowed slots or CPU for that host, either at the level of the queue configuration (qconf -mq) or host configuration (qconf -mc), respectively.
For instance, in my queue configuration, I limit the number of slots for the queue to be 1/2 of total CPU on that host:
slots 10
At the host level I set resource limit to reflect hardware on that execution host:
complex_values slots=20,num_proc=20,h_vmem=60g
For single threaded/processor jobs, the limits at the queue and host level are obeyed i.e. submitting 30 single core jobs, 10 run and 20 wait.
However, when submit 3 multiprocessor jobs each taking 10 CPUs, all run on the single execution host at the same time.
While these limits can be specified by asking the user to use parallel environment and ask for requisite number of slots (i.e. 10 slots per job), intentional or unintentional over-subscription is still possible in an non-parallel environment.
Is there a way to avoid this?
Thanks in advance for the help.