I'm on dask 1.1.1 (latest version) and I have started a dask scheduler at the commandline with this command:
$ dask-scheduler --port 9796 --bokeh-port 9797 --bokeh-prefix my_project
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://10.1.0.107:9796
distributed.scheduler - INFO - bokeh at: :9797
distributed.scheduler - INFO - Local Directory: /tmp/scheduler-pdnwslep
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://10.1.25.4:36310
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.1.25.4:36310
distributed.core - INFO - Starting established connection
then... I tried to start up a client to connect to the scheduler using this code:
from dask.distributed import Client
c = Client('10.1.0.107:9796', set_as_default=False)
but upon trying to do that, I get an error:
...
File "/root/anaconda3/lib/python3.7/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
tornado.gen.TimeoutError: Timeout
During handling of the above exception, another exception occurred:
...
File "/root/anaconda3/lib/python3.7/site-packages/distributed/comm/core.py", line 195, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://10.1.0.107:9796' after 10 s: connect() didn't finish in time
This has been hardcoded in a system that's been running for months now. So I'm just writing this question to verify I'm not doing anything wrong programmatically right? I think it must be something wrong with the environment. Does everything look right to you? what kind of things could be stopping this outside of dask and python? certificates? differing versions of packages? thoughts