I need to create a column which is based on some condition on dask dataframe. In pandas it is fairly straightforward:
ddf['TEST_VAR'] = ['THIS' if x == 200607 else
'NOT THIS' if x == 200608 else
'THAT' if x == 200609 else 'NONE'
for x in ddf['shop_week'] ]
While in dask I have to do same thing like below:
def f(x):
if x == 200607:
y= 'THIS'
elif x == 200608 :
y= 'THAT'
else :
y= 1
return y
ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
ddf1.compute()
Questions:
- Is there a better/more straightforward way to achieve it?
- I can't modify the first dataframe ddf, i need to create ddf1 to se the changes is dask dataframe Immutable object?