I have a sample data set present in my local and I'm trying to do some basic opertaions on a cluster.
import dask.dataframe as ddf
from dask.distributed import Client
client = Client('Ip address of the scheduler')
import dask.dataframe as ddf
csvdata = ddf.read_csv('Path to the CSV file')
Client is connected to a scheduler which in turn is connected to two workers(on other machines).
My Questions may be pretty trivial.
Should this csv file be present on other worker nodes?
I seem to get file not found errors.
Using,
futures=client.scatter(csvdata) x = ddf.from_delayed([future], meta=df) #Price is a column in the data df.Price.sum().compute(get=client.get) #returns" dd.Scalar<series-..., dtype=float64>" How do I access it? client.submit(sum, x.Price) #returns "distributed.utils - ERROR - 6dc5a9f58c30954f77913aa43c792cc8"
Also, I did refer this Loading local file from client onto dask distributed cluster and http://distributed.readthedocs.io/en/latest/manage-computation.html
I thinking I'm mixing up a lot of things here and my understanding is muddled up. Any help would be really appreciated.