sbatch sends compute node to 'drained' status

Question

On newly installed and configured compute nodes in our small cluster I am unable to submit slurm jobs using a batch script and the 'sbatch' command. After submitting, the requested node changes to the 'drained' status. However, I can run the same command interactively using 'srun'.

Works:
srun -p debug --ntasks=1 --nodes=1 --job-name=test --nodelist=node6 -l echo 'test'

Does not work:
sbatch test.slurm
with test.slurm:

#!/bin/sh
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --nodelist=node6
#SBATCH --partition=debug

echo 'test'

It gives me:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug         up    1:00:00      1  drain node6

and I have to resume the node.

All nodes run Debian 9.8, use Infiniband and NIS. I have made sure that all nodes have the same config, version of packages and daemons running. So, I don't see what I am missing.

You can see the reason the node went to drain with `scontrol show node node6 | grep Reason`. Or on the slurm controler log files. — Keldorn, Mar 26 '19 at 03:02
Thanks, Keldorn. That provided some useful information. We finally fixed the issue. — Iomsn, Mar 27 '19 at 09:48

score 0 · Accepted Answer · answered Mar 27 '19 at 09:52

0

Seems like the issue was connected to the present NIS. Just needed to add to the end of /etc/passwd this line:

+::::::

and restart slurmd on the node:

/etc/init.d/slurmd restart

answered Mar 27 '19 at 09:52

Iomsn

53
7

sbatch sends compute node to 'drained' status

1 Answers1