I'm trying to port parts of my application from pandas to dask and I hit a roadblock when using a lamdba function in a groupby on a dask DataFrame.
import dask.dataframe as dd
dask_df = dd.from_pandas(pandasDataFrame, npartitions=2)
dask_df = dask_df.groupby(
['one', 'two', 'three', 'four'],
sort=False
).agg({'AGE' : lambda x: x * x })
This code fails with the following error:
ValueError: unknown aggregate lambda
My lambda function is more complex in my application than here, but the content of the lambda doesn't matter, the error is always the same. There is a very similar example in the documentation, so this should actually work, I'm not sure what I'm missing.
The same groupby works in pandas, but I need to improve it's performance.
I'm using dask 0.12.0 with python 3.5.