0

I am trying to run qsub jobs on a SGE(Sun Grid Engine) cluster that supports a maximum of 688 jobs. I would like to know if there is any way to find out the total number of jobs that are currently running on the cluster so I can submit jobs based on the current cluster load.

I plan to do something like: sleep for 1 minute and check again if the number of jobs in the cluster is < 688 and then submit jobs further.

And just to clarify my question pertains to knowing the total number of jobs submitted on the cluster not just the jobs I have submitted currently.

Thanks in advance.

anonuser0428
  • 11,789
  • 22
  • 63
  • 86

2 Answers2

1

You can use qstat to list the jobs of all users; this with awk and wc can be used to find out the total number of jobs on the cluster:

qstat -u "*" | awk '{if ($5 == "r" || $5 == "qw") print $0;}' | wc -l

The above command also takes into account jobs that are queued and waiting to be scheduled on a compute node.

However, the cluster sysadmins could disallow users to check on jobs that don't belong to them. You can verify if you can see other user's jobs by running:

qstat -u "*"

If you know for a fact that another user is running a job and yet you can't see it while running the above command, it's most likely that the sys admins disabled that option.

Afterthought: from my understanding, you're just a regular cluster user - why are you even bothering to submit jobs this way. Why don't you just submit all the jobs that you want and if the cluster can't schedule your jobs, it will just put them in a qw state and schedule them whenever SGE feels is the most appropriate time.

Florin Stingaciu
  • 8,085
  • 2
  • 24
  • 45
  • well yes that should be the way to do it but there are 2 limits on the cluster. One is 300 per user job limit and 688 cluster limit, although the user shouldn't be concerned with the cluster limit (as that's the reason for queued/waiting option, ) the cluster I am working on doesn't allow the user to submit jobs if there are more than 688 jobs. Hence qsub fails which is the reason I'm trying this option as a workaround to this problem. – anonuser0428 Jul 29 '14 at 20:33
  • Aha -- You should look into using that script in conjuncture with [crontab](http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/) – Florin Stingaciu Jul 29 '14 at 20:35
0

Depending on how cluster is configured, using job array (-t option for qsub) would get around this limit.

I have similar limits set for maximum number of jobs a single user can submit. This limit pertains to individual instances of qsub and not single job array submission potentially many tasks (that limit is set via another configuration variable, max_aj_tasks).

Vince
  • 3,325
  • 2
  • 23
  • 41