My question is based on THIS question.
I should consider using --array=0-60000%200
to limit the number of jobs running in parallel to 200 in slurm. It seems to me that it takes up to a minute to lunch a new job every time that an old job is finished. Given the number of jobs that I am planning to run, I might be wasting a lot of time this way.
I wrote a "most probably" very inefficient alternative, consisting in a script that launches the jobs, checking the number of jobs in the queue and adding jobs if I am still bellow the max number of jobs allowed and while I reached the max number of parallel jobs, sleep for 5 seconds, as follows:
#!/bin/bash
# iterate procedure $1 times. $1=60000
for ((i=0;i<=$1;i++))
do
# wait until any queued process is finished
q=$(squeue -u myuserName | wc -l) #I don't care about +/-1 lines (e.g. title)
while [ $q -gt 200 ] #max number of parallel jobs set to 200
do
sleep 5
q=$(squeue -u myuserName | wc -l)
done
# run the job with sbatch
sbatch...
done
It seems to do a better job compared to my previous method, nevertheless, I would like to know how inefficient is in reality this implementation? and why? Could I be harming the scheduling efficiency of other users on the same cluster?
Thank you.