0

I am interested in performing weak scaling tests on an HPC cluster. In order to achieve this, I run several small tests on 1,2,4,8,16,32,64 nodes with each simulation taking less than a minute to maximum 1 hour. However, the jobs stay in queue (1 hour queue) for several days before the test results are available.

I have two questions:

  1. Is there a way to prioritize the jobs in the job scheduler given that most tests are less than a minute for which I have to wait several days?

  2. Can and to what extent such a job scheduling policy invite abuse of HPC resources. Consider a hypothetical example of an HPC simulation on 32 nodes, which is divided into several small 1 hour simulations that get prioritized because of the solution provided in point 1. above?

Note: the job scheduling and management system used at the HPC center is MOAB. Each cluster node is equipped with 2 Xeon 6140 CPUs@2.3 GHz (Skylake), 18 cores each.

nae9on
  • 249
  • 3
  • 9

1 Answers1

0

Moab's fairshare scheduler may do what you want, or if it doesn't out of the box, may allow tweaking to prioritize jobs within the range you're interested in: http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#topics/fairness/6.3fairshare.html.

Aaron Altman
  • 1,705
  • 1
  • 14
  • 22