I have created a dask dataframe from geopandas futures that each yield a pandas dataframe following the example here: https://gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca
daskdf = dd.from_delayed(lazy_dataframes,lazy_dataframes, meta=lazy_dataframes[0].compute())
All dtypes seem reasonable
daskdf.dtypes
left float64
bottom float64
right float64
top float64
score object
label object
height float64
area float64
geometry geometry
shp_path object
geo_index object
Year int64
Site object
dtype: object
but dd groupby operations fails
daskdf.groupby(['Site']).height.mean().compute()
...
"/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/dask/dataframe/utils.py", line 577, in _nonempty_series
data = np.array([entry, entry], dtype=dtype)
builtins.TypeError: data type not understood
whereas pandas has no problem with the same process on the same data.
daskdf.compute().groupby(['Site']).height.mean()
Site
SOAP 15.102355
Name: height, dtype: float64
What might be happening here with the metadata types that could cause this. As I scale my workflow, I would like to perform distributed operations on persisted data.