1

I have to run a series of python scripts for about 10'000 objects. Each object is characterised by arguments in a row of my catalogue. On my computer, to test the scripts, I was simply using a bash file like:

totrow=`wc -l < catalogue.txt`

for (( i =1; i <=  ${totrow}; i++ )); do

    
    arg1=$(awk 'NR=='${i}' ' catalogue.txt)   
    
    arg2=$(awk 'NR=='${i}'' catalogue.txt)    
    
    arg3=$(awk 'NR=='${i}'' catalogue.txt)
        
    python3 script1.py  ${arg1} ${arg2} ${arg3} 

done    

that runs the script for each row of the catalogue. Now I want to run everything on a supercomputer (with a slurm system). What I would like to do, it is running e.g. 20 objects on 20 cpus at the same time (so 20 rows at the same time) and go on in this way for the entire catalogue.

Any suggestions? Thanks!

LollaSap
  • 13
  • 2
  • Just wondering: What is the purpose of `arg2` and `arg3`? You initialized them to the same value as `arg1`. For your local machine, I think you could replace your script with `xargs -a catalogue.txt -d\\n -I_ python3 script1.py _ _ _`, not only would this be shorter, but also more efficient as you would read `catalogue.txt` only once instead of 1 + 10'000 × 3 times. – Socowi Jul 27 '21 at 10:28

1 Answers1

0

You could set this up as an array job. Put the inner part of your loop into a something.slurm file, and set i equal to the array element ID ($SLURM_ARRAY_TASK_ID) at the top of this file (a .slurm file is just a normal shell script with job information encoded in comments). Then use sbatch array=1-$totrow something.slurm to launch the jobs.

This will schedule each Python call as a separate task, and number them from 1 to $totrow. SLURM will run each of them on the next available CPU, possibly all at the same time.

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45