4

To discover dask, I am currently implementing a K-Means algorithm. To update the means, I want to use a groupBy, but I have to transform my dask.array into a dask.dataframe, then get back to a dask.array :

def update(X, Label):
    '''Update the means by using the labels computed by assign'''
    Y = X.to_dask_dataframe()
    return Y.groupby(Label.to_dask_dataframe()).mean().values

Is there a way to do this without the transformation ?

Maxime Maillot
  • 397
  • 2
  • 8
  • 1
    What you're doing here is not too bad. Transforming between dask.arrays and dask.dataframes is about as cheap and converting between numpy arrays and pandas dataframes. If your data types are homogenous then this should be about the cost of a memory-copy. – MRocklin Apr 10 '17 at 16:00

0 Answers0