Sorting dataset along axis with dask

Asked Oct 24 '20 at 21:37

Active Oct 24 '20 at 21:37

Viewed 121 times

I want to sort dataset (netcdf file) along time dimension for each year and then average them. Problem is that dask only supports 'topk' sorting, which consumes all the memory if include whole range of values. Xarray only supports sorting of 1D arrays. Numpy sort does the job but it also consumes memory. Is there any way to sort whole large dataset across some axis with dask to reduce memory footprint?

asked Oct 24 '20 at 21:37

wol

I am trying to apply xarray.apply_ufunc to groupby object with dask parallelization but getting error that I need to specify output_dtypes (which I have specified). Am I doing something wrong or apply ufunc does not support groupby objects? code: `xarray.apply_ufunc(numpy.sort, dataset.groupby('time.year'), 0, dask='parallelized', output_dtypes=[numpy.float64])`. Error: `ValueError: output dtypes (output_dtypes) must be supplied to apply_func when using dask='parallelized'` – wol Oct 25 '20 at 11:40

Sorting dataset along axis with dask

0 Answers0