I am interested in performing weak scaling tests on an HPC cluster. In order to achieve this, I run several small tests on 1,2,4,8,16,32,64 nodes with each simulation taking less than a minute to maximum 1 hour. However, the jobs stay in queue (1 hour queue) for several days before the test results are available.
I have two questions:
Is there a way to prioritize the jobs in the job scheduler given that most tests are less than a minute for which I have to wait several days?
Can and to what extent such a job scheduling policy invite abuse of HPC resources. Consider a hypothetical example of an HPC simulation on 32 nodes, which is divided into several small 1 hour simulations that get prioritized because of the solution provided in point 1. above?
Note: the job scheduling and management system used at the HPC center is MOAB. Each cluster node is equipped with 2 Xeon 6140 CPUs@2.3 GHz (Skylake), 18 cores each.