dask-jobqueue does not start any worker on slurm cluster

Question

I am trying to run dask on a research cluster managed by slurm.

Launching a job with a classical sbatch script is working. But when I am doing:

from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=12, memory='24 GB', processes=1, interface='ib0')
cluster.scale(1)

The last step returns:

No handlers could be found for logger "dask_jobqueue.core"

When running squeue, no job appear.

All the tests are passing. Using LocalCluster() does work on the login node.

Those are the package versions, with python 2.7:

dask                      0.18.2                     py_0    conda-forge
dask-core                 0.18.2                     py_0    conda-forge
dask-jobqueue             0.3.0                      py_0    conda-forge
distributed               1.22.0                   py27_0    conda-forge

Any clue where to look?

score 0 · Answer 1 · answered Aug 04 '18 at 13:43

0

I recommend using SLURM to investigate the state of the jobs.

Are they running? Or are they stuck in the queue?
Did they run properly? What do the logs say?

answered Aug 04 '18 at 13:43

MRocklin

55,641
23
163
235

It seems that they never get submitted to SLURM. I do not see them queued at all. – LCT Aug 04 '18 at 16:33
I recommend looking at `cluster.job_script()` to see if the script used by dask-jobqueue, given your input arguments, is enough to run jobs on your cluster. You might want to consult with your cluster administrator to learn if there are other arguments you should provide, like queue or project. – MRocklin Aug 04 '18 at 19:59
`cluster.job_header` is another one to check. – Ray Bell Aug 10 '18 at 19:33

dask-jobqueue does not start any worker on slurm cluster

1 Answers1