I couldn't figure out how to compute delayed objects coming from df.groupy.apply()
operation. I really appreciate if someone can help. Here is a sample code I wrote
import pandas as pd
import dask
df = pd.DataFrame(columns=['id','id2','val1'])
df['id'] = ['A','A','A','B','C','C','D','D']
df['id2']=['a','a','b','a','a','b','b','b']
df['val1']= [1,2,3,4,5,6,7,8]
@dask.delayed
def dask_test(group,val_col):
for idx,row in group.iterrows():
group.loc[idx,'test']=2*group.loc[idx,val_col]
return group
tmp_grp = df.groupby(['id','id2']).apply(dask_test,'val1')
The output of tmp_grp is
id id2
A a Delayed('copy-f0e26845-fc3a-4bb7-8609-47b923c0...
b Delayed('copy-9b6cecf5-9fa4-4301-ba2d-dec5478d...
B a Delayed('copy-7b538f4b-ac3f-4c83-b37b-e620d0ba...
C a Delayed('copy-c722fa78-c46e-422a-88a5-b9e48cac...
b Delayed('copy-01454a03-fd28-4fa5-b487-563ccc66...
D b Delayed('copy-f6cf94bd-d457-4495-bb2e-1db0152c...
dtype: object
I don't know how to call delayed objects from this and compute them.
Thank you so much in advance.