Is this the correct way to call compute()
?
def call_minmax_duration(data):
mmin = dd.DataFrame.min(data).compute()
mmax = dd.DataFrame.max(data).compute()
return mmin, mmax
Is this the correct way to call compute()
?
def call_minmax_duration(data):
mmin = dd.DataFrame.min(data).compute()
mmax = dd.DataFrame.max(data).compute()
return mmin, mmax
Two things.
Your data
variable should be a dask.dataframe object, such as might be created by dd.from_pandas(...)
or dd.read_csv(...)
Second, it's probably better to compute both results at once that way shared intermediates only need to be computed once
import dask.dataframe as dd
df = dd.read_csv('2016-*-*.csv')
dd.compute(df.mycolumn.min(), df.mycolumn.max())