0

I am trying to run this benchmark on a small dask cluster made of two nodes. The remote worker is simply deployed with the dask-worker command and it appears correctly in the output of client in the benchmark. I've also tried to run some simple functions, like sleep and it works smoothly.

When I run the benchmark, it eventually gets stuck when dealing with the add function (it gets stuck at 2036/2047), it's like the 11 last tasks are never completed. When I look at the worker's logs, I get a lot of the following messages:

distributed.worker - INFO - Can't find dependencies for key add-efe22746-c80b-42f1-a02d-1217928ba4ec
distributed.worker - INFO - Dependent not found: add-37c59ee3-e3ed-4643-ae13-dd96291207bd 1 . Asking scheduler

I first got this problem with my own code but now that I have this issue with the benchmark too, I believe it has more to do with my setup. Setting up a scheduler and a worker is so simple I hardly see what went wrong here. Is there something I don't get about how to deploy workers, like is there anything special one must be very careful about when deploying workers?

Edit: On the master node I have both my scheduler and a worker. If I kill the worker on this node, it seems to work fine, although none of the cores of this node are used. Is this how I'm supposed to configure the cluster? That is, launch no worker on the master node?

Aratz
  • 430
  • 5
  • 16

1 Answers1

0

I just ran that same notebook and unfortunately wasn't able to reproduce those warnings. My hope is that they've been cleaned up since you originally asked the question.

As always, if you're able to provide a minimal reproducible failure, a bug report on the github issue tracker is always welcome.

MRocklin
  • 55,641
  • 23
  • 163
  • 235