19

You need to run, say, 30 srun jobs, but ensure each of the jobs is run on a node from the particular list of nodes (that have the same performance, to fairly compare timings). How would you do it?

What I tried:

  • srun --nodelist=machineN[0-3] <some_cmd> : runs <some_cmd> on all the nodes simultaneously (what i need: to run <some_cmd> on one of the available nodes from the list)

  • srun -p partition seems to work, but needs a partition that contains exactly machineN[0-3], which is not always the case.

Ideas?

Ayrat
  • 1,221
  • 1
  • 18
  • 36

2 Answers2

24

Update: Version 23.02 has fixed this, as can be read in the Release notes: Allow for --nodelist to contain more nodes than required by --nodes.


You can go the opposite direction and use the --exclude option of sbatch:

srun --exclude=machineN[4-XX] <some_cmd>

Then slurm will only consider nodes that are not listed in the excluded list. If the list is long and complicated, it can be saved in a file.

Another option is to check whether the Slurm configuration includes ''features'' with

sinfo  --format "%20N %20f"

If the 'features' column shows a comma-delimited list of features each node has (might be CPU family, network connection type, etc.), you can select a subset of the nodes with a specific features using

srun --constraint=<some_feature> <some_cmd>
damienfrancois
  • 52,978
  • 9
  • 96
  • 110
9

You can use -w option. Its tested in slurm version 17.11.10

For example:

srun -p partition  -w node10 hostname
Fırat Yilmaz
  • 99
  • 2
  • 2
  • using your way, is it possible to specify a *list* of nodes? (instead of one specific node) – Ayrat Feb 28 '19 at 14:08
  • yes, srun -p partition -w node10,node11 hostname command returns the hostname of the 2 computenodes. You can also use regex "-w node[10-11]" – Fırat Yilmaz Mar 01 '19 at 07:28
  • ...but i don't need to run it on _both_ nodes, only to run on one, but from the list. ?? thanks. (unfortunately I cannot test it myself right now, because have no access to srun) – Ayrat Mar 02 '19 at 10:52
  • if you know the $hostname of the computenode that you want your job to work on, then srun -p partition -w $hostname command should be enough for you. If you have to define a nodelist like srun -p partition --nodelist /path/to/nodelist and want to choose 1 node from that nodelist, that seems an inconvenient way to me and i never tried that if any mechanism exists. Actually when you choose your partition, you are using a nodelist which are defined in that partition. then -w let you choose one or multiple computenodes you would like to use. – Fırat Yilmaz Mar 05 '19 at 11:09