Assume a dask dataframe with X partitions. Assume a pandas dataframe with the same X number of rows. Each row of the pandas dataframe contains data relevant for each partition of the dask dataframe.
I would like to assign each pandas df row to a new dask dataframe partition column
import pandas as pd
import dask
imoprt numpy as np
# default dask dataframe with 30 partitions
ddf = dask.datasets.timeseries()
df0 = pd.DataFrame({'A': np.random.randint(0,100, size=30),
'B': np.random.randint(0,100, size=30)})
The very inefficient way to do this would be:
df_list = []
for n in range(ddf.npartitions):
df_list.append(ddf.partitions[n])
for i,df in enumerate(df_list):
df['A'] = df0['A'].iloc[i]
How can i achieve the same result but remain in dask? Maybe with map_partitions?
If its not possible in dask how can it be more efficient avoiding loops?