I am new to DASK and would like to make a test of running DASK on a cluster. The cluster has a head server and several other nodes. I can enter into other nodes by a simple ssh without password, once I log in the head server. I would like to run a simple function to iterate over a large array. The function is defined below. It is to convert dt64 to numpy datetime object.
import xarray as xr
import numpy as np
from dask import compute, delayed
import dask.multiprocessing
from datetime import datetime, timedelta
def converdt64(dt64):
ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
return datetime.utcfromtimestamp(ts)
Then on the terminal, I iterate over an array 1D with size of N by applying this function.
values = [delayed(convertdt64)(x) for x in arraydata]
results1 = compute(*values,scheduler='processes’)
This uses some cores on the head server and it works, though slowly. Then I tried to launch the function on several nodes of the cluster by using the Client as below:
from dask.distributed import Client
client = Client("10.140.251.254:8786 »)
results = compute(*values, scheduler='distributed’)
It does not work at all. There are some warnings and one error message as in the following.
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://10.140.251.254:57257 remote=tcp://10.140.251.254:8786>
CancelledError: convertdt64-0205ad5e-214b-4683-b5c4-b6a2a6d8e52f
I also tried dask.bag and I got the same error message. What may be the reasons that the parallel computation on the cluster does not work ? Is it due to some server/network configuration, or my incorrect use of DASK client ? Thanks in advance for your help !
Best wishes
Shannon X