Create a fake job for SGE to fill all nodes before launching a job that will run on complete nodes

Question

The problem of submitting jobs on SGE to run on complete nodes was addressed before in this forum. Several solutions have been suggested, one of which is to configure SGE to allow for the usage of the option -l excl=TRUE, another solution is to ask SGE for hard memory or load limits.

I'm using the cluster of my university for my master thesis, the parallel environment openmpi is configured with the fill-up strategy. Typically the nodes of the cluster contain 16 or 20 cores each, the problem is that some of the users instead of launching computations with a number of cores that is multiple of 16 (or 20), they launch their jobs with an arbitrary number of cores. As a result, when I launch a job with -pe openmpi 16, sometimes SGE will reserve the processors on 3 nodes (e.g. 6 + 1 + 10) which makes the computations very slow.

I asked the administrator to configure the cluster to allow for -l excl=TRUE but he refused to change the configuration before making tests (I don't know for how long).

Now I have a new idea that may allow me to have a similar result as (-l excl=TRUE) but without changing the cluster:

Write a script that will scan the queue and estimate the number of cores that must be asked to SGE so that he fills all the running nodes and let only completely free nodes.
Launch a fake job with the computed number of cores that will wait for a certain amount of time.
launch my true job (e.g -pe openmpi 2*16=32).
Delete the fake job to allow other users to use its cores

Can someone provide me an example of such code ?

If you intend scanning free/utilized nodes before submitting, you also could try to request host-queues which also should result in what you want to achieve. `qsub -pe mpi 32 -q all.q@node01,all.q@node02`, provided that both nodes have 16 cores. — Thomas, Mar 15 '18 at 17:30
@Thomas, the server is full now, I will try your suggestion after. However, do you know if there is a command for displaying hosts with no running jobs (I cant find something useful in the _qhost_ manual) ? — FineUser, Mar 15 '18 at 18:06
`qstat -f` should report the single node-queues including allocated/free slots. — Thomas, Mar 15 '18 at 18:08
@Thomas, thanks, I will see how to target free nodes from the displayed table. Maybe using awk or another command. Actually I try to automate the process of listing free nodes -> selecting e.g. 2 nodes from the list -> send job with your command above. — FineUser, Mar 15 '18 at 18:13
@Thomas -q all. q@node01, all. q@node02 is not working, the job is automatically queued — FineUser, Mar 20 '18 at 20:54
@Thomas I found a solution my self, launching jobs with qsub -l cpu=0 or cpu=0.1 works perfectly for me. — FineUser, Mar 21 '18 at 14:02

score 0 · Accepted Answer · edited May 20 '18 at 09:08

0

Launching jobs with qsub -l cpu=0 (or cpu=0.1) works perfectly for me.

edited May 20 '18 at 09:08

Thomas

4,225
5
23
28

answered Mar 21 '18 at 14:02

FineUser

1
2

Create a fake job for SGE to fill all nodes before launching a job that will run on complete nodes

1 Answers1