2

Is this the correct way to call compute()?

def call_minmax_duration(data):
    mmin = dd.DataFrame.min(data).compute()
    mmax = dd.DataFrame.max(data).compute()
    return mmin, mmax
Dervin Thunk
  • 19,515
  • 28
  • 127
  • 217

1 Answers1

3

Two things.

Your data variable should be a dask.dataframe object, such as might be created by dd.from_pandas(...) or dd.read_csv(...)

Second, it's probably better to compute both results at once that way shared intermediates only need to be computed once

Example

import dask.dataframe as dd
df = dd.read_csv('2016-*-*.csv')

dd.compute(df.mycolumn.min(), df.mycolumn.max())
MRocklin
  • 55,641
  • 23
  • 163
  • 235