4

I get dask-worker to connect to dask-scheduler. My problem occurs after issuing tasks. It looks to me (in the task stream) that the workers do perform the computation. The error log from the dask worker is very long and I don't get it - it says timeout, connection refused? Which connection is it that's refused? AFAIK there are no firewalls between the two machines (on a LAN).

Note that same/similar looking errors occur over and over again. Eventually, the computation fails, stating "ValueError: Could not find dependent array-original-0effb3cc096e32a82e95557c88b795fd. Check worker logs"

distributed.nanny - INFO -         Start Nanny at: 'tcp://10.0.0.42:36199'
distributed.worker - INFO -       Start worker at:      tcp://10.0.0.42:44304
distributed.worker - INFO -              bokeh at:            10.0.0.42:8789
distributed.worker - INFO -               http at:            10.0.0.42:40349
distributed.worker - INFO -              nanny at:            10.0.0.42:36199
distributed.worker - INFO - Waiting to connect to:       tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         40
distributed.worker - INFO -                Memory:                  121.64 GB
distributed.worker - INFO -       Local Directory:            worker-qdz2_s09
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:             tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
    quiet_exceptions=EnvironmentError)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
    who=self.address)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
    comm = yield self.pool.connect(self.addr)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
    connection_args=self.connection_args)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
    _raise(error)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc5e6f98>: ConnectionRefusedError: [Errno 111] Connection refused
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 297, 0, 0)
distributed.worker - INFO - Dependent not found: array-original-7a8cba4415f43af718833379b651ccb6 0 .  Asking scheduler
distributed.worker - INFO - Dependent not found: array-original-0effb3cc096e32a82e95557c88b795fd 0 .  Asking scheduler
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 263, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 292, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 256, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 278, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 284, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 275, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 285, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 301, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 295, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 303, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 271, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 281, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 287, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 305, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 282, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 173, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 178, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 190, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 185, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 195, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 194, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 177, 0, 0)
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
    quiet_exceptions=EnvironmentError)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
    who=self.address)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
    comm = yield self.pool.connect(self.addr)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
    connection_args=self.connection_args)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
    _raise(error)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc50b4a8>: ConnectionRefusedError: [Errno 111] Connection refused
pletnes
  • 439
  • 4
  • 15

0 Answers0