0

I'm setting up Dask Python cluster at work (30 machines, 8 cores each in average). People use only a portion of their CPU power, so dask-workers will be running on background at low priority. All workers are listening to dask-scheduler on my master node. It works perfect if only I who use it, however it's gonna be used by several people in a concurrent manner - so i need to be able to admin this cluster:

  • Authenticate users, reject unknowns
  • Identify who submitted which jobs
  • Restrict number of submitted jobs per user
  • Restrict timeout for computation per job
  • Kill any job as admin

dask.distributed out of box provides a little of functionality described above. Could you please advice on some solution (may be hybrid Dask + something)?

stkubr
  • 371
  • 1
  • 5
  • 15

1 Answers1

0

Usually people use a cluster manager like Kubernetes, Yarn, SLURM, SGE, PBS or something else. That system handles user authentication, resource management, and so on. A user then uses the one of the Dask-kubernetes, Dask-yarn, Dask-jobqueue projects to create their own short-lived scheduler and workers on the cluster on an as-needed basis.

MRocklin
  • 55,641
  • 23
  • 163
  • 235