I have a function which returns a dataframe to me. I am trying to use this function in parallel by using dask.
I append the delayed objects of the dataframes into a list. However, the run-time of my code is the same with and without dask.delayed.
I use the reduce function from functools along with pd.merge
to merge my dataframes.
Any suggestions on how to improve the run-time?
The visualized graph and code are as below.
from functools import reduce d = [] for lot in lots: lot_data = data[data["LOTID"]==lot] trmat = delayed(LOT)(lot, lot_data).transition_matrix(lot) d.append(trmat) df = delayed(reduce)(lambda x, y: x.merge(y, how='outer', on=['from', "to"]), d)