I am trying to run this benchmark on a small dask cluster made of two nodes. The remote worker is simply deployed with the dask-worker
command and it appears correctly in the output of client
in the benchmark. I've also tried to run some simple functions, like sleep
and it works smoothly.
When I run the benchmark, it eventually gets stuck when dealing with the add
function (it gets stuck at 2036/2047), it's like the 11 last tasks are never completed. When I look at the worker's logs, I get a lot of the following messages:
distributed.worker - INFO - Can't find dependencies for key add-efe22746-c80b-42f1-a02d-1217928ba4ec
distributed.worker - INFO - Dependent not found: add-37c59ee3-e3ed-4643-ae13-dd96291207bd 1 . Asking scheduler
I first got this problem with my own code but now that I have this issue with the benchmark too, I believe it has more to do with my setup. Setting up a scheduler and a worker is so simple I hardly see what went wrong here. Is there something I don't get about how to deploy workers, like is there anything special one must be very careful about when deploying workers?
Edit: On the master node I have both my scheduler and a worker. If I kill the worker on this node, it seems to work fine, although none of the cores of this node are used. Is this how I'm supposed to configure the cluster? That is, launch no worker on the master node?