1

I am new to DASK and would like to make a test of running DASK on a cluster. The cluster has a head server and several other nodes. I can enter into other nodes by a simple ssh without password, once I log in the head server. I would like to run a simple function to iterate over a large array. The function is defined below. It is to convert dt64 to numpy datetime object.

import xarray as xr import numpy as np from dask import compute, delayed import dask.multiprocessing from datetime import datetime, timedelta def converdt64(dt64): ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's') return datetime.utcfromtimestamp(ts)

Then on the terminal, I iterate over an array 1D with size of N by applying this function.

values = [delayed(convertdt64)(x) for x in arraydata] results1 = compute(*values,scheduler='processes’)

This uses some cores on the head server and it works, though slowly. Then I tried to launch the function on several nodes of the cluster by using the Client as below:

from dask.distributed import Client client = Client("10.140.251.254:8786 ») results = compute(*values, scheduler='distributed’)

It does not work at all. There are some warnings and one error message as in the following.

distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://10.140.251.254:57257 remote=tcp://10.140.251.254:8786>

CancelledError: convertdt64-0205ad5e-214b-4683-b5c4-b6a2a6d8e52f

I also tried dask.bag and I got the same error message. What may be the reasons that the parallel computation on the cluster does not work ? Is it due to some server/network configuration, or my incorrect use of DASK client ? Thanks in advance for your help !

Best wishes

Shannon X

Shannon
  • 11
  • 2

1 Answers1

0

...then I tried to launch the function on several nodes of the cluster by using the Client as below:

I had similar issues trying to run tasks on the scheduler. The nodes connect just fine. Attempting to submit tasks, however, results in cancellation.

The documented examples were either local or from the same node as the scheduler. When I moved my client to the scheduler node the problem went away.