I have created a single (remote) scheduler and ten worker on different machines on the same network and try to distribute a dataframe from a client. My problem is that it takes 30min to do the scatter.
from dask.distributed import Client
df = pd.DataFrame({ i : range(10) for i in range(10)})
client = Client(scheduler_addr)
future = client.scatter(df, broadcast=True)
This code works but it is too slow to be usable - with broadcast=False it works reasonably fast. I have created Scheduler and Worker both with default arguments. How should it be done instead?
my dask.distributed version is 2022.01.0