I'm attempting to run the Dask-MPI "Getting Started" (http://mpi.dask.org/en/latest/) example in a fresh Anaconda environment.
I set up an environment using
conda create -n dask-mpi -c conda-forge python=3.7 dask-mpi
conda activate dask-mpi
Inside the environment, I run
mpirun -np 4 dask-mpi --scheduler-file ./scheduler.json
Then, from a python interpreter on the same machine (and in the same folder), I run
from dask.distributed import Client
client = Client(scheduler_file='/path/to/scheduler.json')
This results in the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/client.py", line 712, in __init__
self.start(timeout=timeout)
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/client.py", line 858, in start
sync(self.loop, self._start, **kwargs)
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/utils.py", line 331, in sync
six.reraise(*error[0])
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/utils.py", line 316, in f
result[0] = yield future
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/client.py", line 954, in _start
yield self._ensure_connected(timeout=timeout)
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/distributed/client.py", line 1015, in _ensure_connected
timedelta(seconds=timeout), self._update_scheduler_info()
File "/home/nleaf/anaconda3/envs/dask-mpi/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
tornado.util.TimeoutError: Timeout
The terminal that I ran dask-mpi from does not have any output which would indicate that something is trying to connect. I have verified that the port in question, 8786, is open. I've also verified via debugger that the client is getting the correct address from the scheduler file.
I've tried this in quite a few different environments and on a few different machines, including a fresh Ubuntu 18.04 docker container. I'm completely at a loss for what steps I might be missing.