Two options
Network file systems
As suggested in the comments, there are various ways to make your local file accessible to other machines in your cluster using normal file system solutions. This is a great choice if accessible to you.
Load and scatter locally
If that doesn't work then you can always load data locally and scatter it out to the various workers of your cluster. If your file is larger than the memory of your single computer then you might have to do this piece by piece.
Single pass
If everything fits in memory then I would load the data normally and then scatter it out to a worker. You could split it out afterwards and spread it to other workers if desired:
import pandas
import dask.dataframe as dd
from dask.distributed import Client
client = Client('scheduler-address:8786')
df = pd.read_csv('myfile.csv')
future = client.scatter(df) # send dataframe to one worker
ddf = dd.from_delayed([future], meta=df) # build dask.dataframe on remote data
ddf = ddf.repartition(npartitions=20).persist() # split
client.rebalance(ddf) # spread around all of your workers
Multiple bits
If you have multiple small files then you can iteratively load and scatter, perhaps in a for loop, and then make a dask.dataframe from many futures
futures = []
for fn in filenames:
df = pd.read_csv(fn)
future = client.scatter(df)
futures.append(future)
ddf = dd.from_delayed(futures, meta=df)
In this case you could probably skip the repartition and rebalance steps
If you have a single large file then you would probably have to do some splitting of it yourself, either with pd.read_csv(..., chunksize=...)