How to reliably clean up dask scheduler/worker

Question

I'm starting up a dask cluster in an automated way by ssh-ing into a bunch of machines and running dask-worker. I noticed that I sometimes run into problems when processes from a previous experiment are still running. Wha'ts the best way to clean up after dask? killall dask-worker dask-scheduler doesn't seem to do the trick, possibly because dask somehow starts up new processes in their place.

score 3 · Answer 1 · answered Jul 29 '18 at 20:20

If you start a worker with dask-worker, you will notice in ps, that it starts more than one process, because there is a "nanny" responsible for restarting the worker in the case that it somehow crashes. Also, there may be "semaphore" processes around for communicating between the two, depending on which form of process spawning you are using.

The correct way to stop all of these would be to send a SIGINT (i.e., keyboard interrupt) to the parent process. A KILL signal might not give it the chance to stop and clean up the child process(s). If some situation (e.g., ssh hangup) caused a more radical termination, or perhaps a session didn't send any stop signal at all, then you will probably have to grep the output of ps for dask-like processes and kill them all.

How to reliably clean up dask scheduler/worker

1 Answers1