0

When configuring a slurm cluster you need to have a copy of the configuration file slurm.conf on all nodes. These copies are identical. In the situation where you need to use GPUs in your cluster you have an additional configuration file that you need to have on all nodes. This is the gres.conf. My question is - will this file be different on each node depending on the configuration on that node or will it be identical on all nodes (like slurm.conf?). Assume that the nodes have different configurations of gpus in them and are not identical.

Durai Arasan
  • 123
  • 5

1 Answers1

1

Since Slurm version 14.3.0, the gres.conf accepts a NodeName parameter so that the same file can be setup on all nodes.

From the NEWS file:

gres.conf - Add "NodeName" specification so that a single gres.conf file can be used for a heterogeneous cluster.

It will thus look something like this:

NodeName=node001 Name=gpu File=/dev/nvidia0
NodeName=node002 Name=gpu File=/dev/nvidia[0-1]
...

Before that, the gres.conf file had to be distinct for each node.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110