1

I have a standard cluster setup on kubernetes using the dask docker images but not using the dask helm charts. I tried running an existing script on the cluster but doesn’t seem to run. It keeps throwing errors.

The cluster details: 1 notebook, 1 scheduler, 1 worker & 1 shared vol. I read up on some of the threads on KilledWorkers so I looked into the logs but couldn't figure it out.

distributed.worker - ERROR - None Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/worker.py", line 814, in handle_scheduler comm, every_cycle=[self.ensure_communicating, self.ensure_computing] File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 748, in run yielded = self.gen.send(value) File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 457, in handle_stream msgs = yield comm.read() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/conda/lib/python3.7/site-packages/distributed/comm/tcp.py", line 218, in read frames, deserialize=self.deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 85, in from_frames res = _from_frames() File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 71, in _from_frames frames, deserialize=deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/core.py", line 126, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 189, in deserialize dumps, loads, wants_context = families[name] KeyError: None

1 Answers1

0

I got same problem and found solution.

In Dask 2.3 distributed serialization changed a bit. Your client is probably higher than 2.3.0 and scheduler and workers aren't. Try to upgrade your cluster so either everything is higher than 2.3.0 or lower than that.

inc0
  • 215
  • 2
  • 8