How to determine job array size for lots of jobs?

Question

What is the best way to process lots of files in parallel via Slurm?

I have a lot of files (let's say 10000) in a folder (Each files get 10 secs or so to process). I want to determine sbatch job array size as 1000, naturally. (#SBATCH --array=1-10000%100) But it seems I can't determine more than some numbers(probably 1k). How you handle job array numbers? It seems to me because of my process don't take too much time, I think i should determine one job NOT for one file but for multiple files, right?

Thank you

score 0 · Answer 1 · answered Feb 17 '22 at 10:55

If the process time is 10 second you should consider packing the tasks in a single job, both because such short jobs take longer to schedule than to run and because there is a limit on the number of jobs in an array.

Your submission script could look like this:

#!/bin/bash
#SBATCH --ntasks=16 # or any other number depending on the size of the cluster and the maximum allowed wall time
#SBATCH --mem-per-cpu=...
#SBATCH --time=... # based on the number of files and number of tasks

find . -name file_pattern -print0 | xargs -I{} -0 -P $SLURM_NTASKS srun -n1 -c1 --exclusive name_of_the_program {}

Make sure to replace all the ... and file_pattern and name_of_the_program with appropriate values.

The script will look for all files matching file_pattern in the submission directory and run the name_of_the_program program on it, limiting the number of concurrent instantes to the number of CPUs (more precisely number of tasks) requested. Note the use of --exclusive here which is specific for this use case and is deprecated with --exact in recent Slurm versions.

How to determine job array size for lots of jobs?

1 Answers1