What is the best job scheduler policy to prioritize small HPC jobs for weak-scaling tests?

Question

I am interested in performing weak scaling tests on an HPC cluster. In order to achieve this, I run several small tests on 1,2,4,8,16,32,64 nodes with each simulation taking less than a minute to maximum 1 hour. However, the jobs stay in queue (1 hour queue) for several days before the test results are available.

I have two questions:

Is there a way to prioritize the jobs in the job scheduler given that most tests are less than a minute for which I have to wait several days?
Can and to what extent such a job scheduling policy invite abuse of HPC resources. Consider a hypothetical example of an HPC simulation on 32 nodes, which is divided into several small 1 hour simulations that get prioritized because of the solution provided in point 1. above?

Note: the job scheduling and management system used at the HPC center is MOAB. Each cluster node is equipped with 2 Xeon 6140 CPUs@2.3 GHz (Skylake), 18 cores each.

score 0 · Answer 1 · answered Sep 25 '19 at 20:44

0

Moab's fairshare scheduler may do what you want, or if it doesn't out of the box, may allow tweaking to prioritize jobs within the range you're interested in: http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#topics/fairness/6.3fairshare.html.

answered Sep 25 '19 at 20:44

Aaron Altman

1,705
1
14
22

What is the best job scheduler policy to prioritize small HPC jobs for weak-scaling tests?

1 Answers1