0

I'm trying to read from each worker a unique local file, however I get the same result across all the workers, instead of a unique result from each worker....Can someone please point what I'm doing wrong ?

from dask.distributed import Client, progress
c = Client()
c

import dask.dataframe as dd

filename_1='/tmp/1990.csv'
filename_2='/tmp/1991.csv'
filename_3='/tmp/1992.csv'

future_1 = c.submit(dd.read_csv,filename_1 , workers='172.18.0.3')
future_2 = c.submit(dd.read_csv,filename_2 , workers='172.18.0.5')
future_3 = c.submit(dd.read_csv, filename_3 , workers='172.18.0.6')

future_1.result().head()
future_2.result().head()
future_3.result().head()

I will get the same result , instead of unique data from each one of them.

Rsokolov
  • 53
  • 1
  • 6

1 Answers1

0

You probably want to use pandas.read_csv here rather than dask.dataframe.read_csv

https://docs.dask.org/en/latest/delayed-best-practices.html#don-t-call-dask-delayed-on-other-dask-collections

MRocklin
  • 55,641
  • 23
  • 163
  • 235