The problem of submitting jobs on SGE to run on complete nodes was addressed before in this forum. Several solutions have been suggested, one of which is to configure SGE to allow for the usage of the option -l excl=TRUE, another solution is to ask SGE for hard memory or load limits.
I'm using the cluster of my university for my master thesis, the parallel environment openmpi is configured with the fill-up strategy. Typically the nodes of the cluster contain 16 or 20 cores each, the problem is that some of the users instead of launching computations with a number of cores that is multiple of 16 (or 20), they launch their jobs with an arbitrary number of cores. As a result, when I launch a job with -pe openmpi 16, sometimes SGE will reserve the processors on 3 nodes (e.g. 6 + 1 + 10) which makes the computations very slow.
I asked the administrator to configure the cluster to allow for -l excl=TRUE but he refused to change the configuration before making tests (I don't know for how long).
Now I have a new idea that may allow me to have a similar result as (-l excl=TRUE) but without changing the cluster:
- Write a script that will scan the queue and estimate the number of cores that must be asked to SGE so that he fills all the running nodes and let only completely free nodes.
- Launch a fake job with the computed number of cores that will wait for a certain amount of time.
- launch my true job (e.g -pe openmpi 2*16=32).
- Delete the fake job to allow other users to use its cores
Can someone provide me an example of such code ?