3

I submit jobs using headless NetLogo to a HPC server by the following code:

#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
    --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
    --experiment test \
    --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv

Below is the snapshot of a cluster queue using:

qstat -g c

enter image description here

I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.

CPU USAGE CHECK:

qhost -u abhishekb

enter image description here

When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.

EDIT: enter image description here

EDIT 2;

enter image description here

Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142

3 Answers3

3

A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.

It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).

Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.

rth
  • 10,680
  • 7
  • 53
  • 77
3

1. CPU percentage load

I agree with @rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's

  • almost certain not to work

and, (as you've found)

  • you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)

The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.

Here are some answers to the more specific parts of your question:

2. CQLOAD

You ask

CQLOAD (what does it mean too?)

The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)

If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:

OUTPUT FORMATS

...

an average of the normalized load average of all queue hosts. In order to reflect each hosts different signifi- cance the number of configured slots is used as a weight- ing factor when determining cluster queue load. Please note that only hosts with a np_load_value are considered for this value. When queue selection is applied only data about selected queues is considered in this formula. If the load value is not available at any of the hosts '- NA-' is printed instead of the value from the complex attribute definition.

This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.

3. Number of nodes reserved

In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.

You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use

N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)
Community
  • 1
  • 1
J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • Did this answer help / tell you what you needed? If so - it would be nice to accept... If not, can you clarify what it hasn't addressed? – J Richard Snape Mar 25 '15 at 13:03
2

So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?

In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).

So if you want to make a process run with higher priority then from within bash do

[abhishekb@hpc ~]$ start_process &
[abhishekb@hpc ~]$ jobs -x sudo renice -n -20 -p %+

Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.

randomusername
  • 7,927
  • 23
  • 50
  • Since I run it on my college's HPC server I am not the superuser. Any way around it. – Abhishek Bhatia Feb 28 '15 at 19:08
  • @AbhishekBhatia No, sadly not. The super user is the only one who can decrease the niceness of a program from what it would otherwise be. Because if you could put a process in place with too high a priority, it can hog *all* of the cpu time and no one would even be able to use the cores your program has requested. – randomusername Feb 28 '15 at 19:20
  • @AbhishekBhatia Looking through the man page for nice(2), you can do it without superuser privileges every time by having the superuser give your program the ``CAP_SYS_NICE`` capability (see capabilities(7)). – randomusername Feb 28 '15 at 19:35
  • please check the edit above in question, it doesn't seem to work. – Abhishek Bhatia Mar 01 '15 at 10:13
  • @AbhishekBhatia Woops, I was missing the ``-x`` command line argument to ``jobs``. And the output from ``qstat`` is showing a "job-id" not the "process id" which is what ``renice`` needs. Also, note the "or" in the last paragraph: if your using the ``jobs ...`` form then type it in exactly as I have typed it. – randomusername Mar 01 '15 at 19:54
  • There is still some issue. Check the "Edit 2" in the question please. – Abhishek Bhatia Mar 02 '15 at 15:53
  • @AbhishekBhatia again, you're currently passing the "job-id" of the process, ``renice`` doesn't understand what a "job-id" is. You need to use the "process id", so don't call ``qstat``, instead call ``ps`` or something like that to lookup the "process id" of what you want to use. – randomusername Mar 02 '15 at 16:02
  • How can I find the process-id please help. – Abhishek Bhatia Mar 03 '15 at 15:46
  • On a single system, you can figure out pid's using `ps` or `pstree`. I'm not sure how to do that on an HPC, but you would have to get the superuser to allow you to nice your processes, and s/he would probably know how to get the relevant parameter for nicing your processes--*if s/he lets you do that*. Actually, SGE probably has better ways of increasing the priority of jobs, but again, that's up to the superuser. Keep in mind that if you are allowed extra CPU time, then someone else is losing CPU time, and if there's enough CPU available for everyone, none of this matters. – Mars Mar 21 '15 at 18:28