-1

I was running NetLogo headless on HPC using behaviourspace. Some non-NetLogo other user on the HPC complained to me that I am not utilizing the CPU cores to very little extent and should increase. I don't know exactly know to how to do so, please help. I am guessing renice won't be of any help.

Code:

#!/bin/bash
#$ -N NewPara3-d
#$ -q all.q
#$ -pe mpi 30
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
    --model /home/abhishekb/models/Test_results3-d.nlogo \
    --experiment 3-d \
    --table /home/abhishekb/csvresults/Test_results3-d.csv 
Seth Tisue
  • 29,985
  • 11
  • 82
  • 149
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
  • what kind of computer would you mean with HPC ? [wikipedia](http://en.wikipedia.org/wiki/Supercomputer) says "High-performance computing" but there is a lot of environnement like pbs, slurm etc... – delaye Mar 01 '15 at 21:25
  • It is High-performance computing, cent os. Full Details- http://it.iiitd.edu.in/HPC_final_doc.pdf – Abhishek Bhatia Mar 02 '15 at 15:47
  • Ask the other users what they mean or check with the cluster admin. I would assume your system load is not matching what you are requesting. This can either be an issue with how you submit your job or you may be hitting some bottleneck that is causing your job to wait on I/O to disk or between processes. Someone that knows the specific environment should be able to help. – chuck Mar 02 '15 at 18:43
  • 2
    ***This is a guess*** - you are probably requesting lots of cores for your job and then not using them very hard (maybe for reasons suggested by @chuck). This can annoy other users because while you have the CPU cores reserved and doing very little, they can't use them. A plan of action - run your model using BehaviourSpace (i.e. headless) on your own machine - check it is utilising all the CPU(s) at (close to) 100%. If not, work on your script until it does (if you're using `--threads` try adjusting that number...). – J Richard Snape Mar 03 '15 at 10:19
  • If you still need help - we probably need to see the actual command you are deploying to the cluster (e.g. `java -Xmx1024m -Dfile.encoding=UTF-8 -cp NetLogo.jar org.nlogo.headless.Main --model Fire.nlogo ...`) and the model (or cut-down version, in particular any custom file I/O you might have) – J Richard Snape Mar 03 '15 at 10:20
  • @JRichardSnape I have no custom I/O and --threads is 1 ,that is , one thread per processor. Tried running it on machine at real-time priority the utilization turned out to be ~100%. I don't get the issue. – Abhishek Bhatia Mar 03 '15 at 15:23
  • @JRichardSnape You are saying allocating more RAM would help? – Abhishek Bhatia Mar 03 '15 at 15:23
  • No - I can't say without seeing the exact command line you're using (and possibly your model, although no custom I/O probably means we don't need that). Try omitting `threads` altogether - the docs say this will then "default to one thread per processor". Depending on cluster setup it may be that the job is deployed per machine and threads=1 might only start 1 thread even if the machine has e.g. 8 cores available.***N.B. This is a guess (hence comment, not answer yet) - someone with more expertise might contradict this, but it is worth a try*** I don't think job priority is the root cause. – J Richard Snape Mar 03 '15 at 15:38
  • @JRichardSnape Have added the code. Yeah I guess you're right with the priority thing. This should provide more details http://stackoverflow.com/questions/28628527/netlogo-hpc-cpu-percentage-use-increase . Please check. – Abhishek Bhatia Mar 03 '15 at 15:47

2 Answers2

2

In comments you link your related question where you're trying to use linux process priority to make jobs run faster / use more CPU

There you ask

CQLOAD (what does it mean too?)

The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)

If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:

OUTPUT FORMATS

...

an average of the normalized load average of all queue hosts. In order to reflect each hosts different signifi- cance the number of configured slots is used as a weight- ing factor when determining cluster queue load. Please note that only hosts with a np_load_value are considered for this value. When queue selection is applied only data about selected queues is considered in this formula. If the load value is not available at any of the hosts '- NA-' is printed instead of the value from the complex attribute definition.

This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output shows 0.84: the average load on processors in all.q is 84%. This doesn't seem too low.

You state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder if it's just because you're using a lot of nodes (even if just for a short time).

You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 30 - maybe take the number 30 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use

N = (time to run 1 job) * number of runs in experiment) / time you want the run to take 
Community
  • 1
  • 1
J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • If you need more info - you might need to give some details of your experiment (how many runs it actually triggers) and how long a single run takes (e.g. on your desktop computer. – J Richard Snape Mar 03 '15 at 16:34
  • Did this answer help / tell you what you needed? If so - it would be nice to accept... If not, can you clarify what it hasn't addressed? – J Richard Snape Mar 25 '15 at 13:03
0

I'm not an expert but the Scheduler on the cluser seems to be supported in OpenMole.

OpenMole is a nice solution for Embed your NetLogo model transparently on many environnements. It can be on solution ...

delaye
  • 1,357
  • 27
  • 45