I want to sort dataset (netcdf file) along time dimension for each year and then average them. Problem is that dask only supports 'topk' sorting, which consumes all the memory if include whole range of values. Xarray only supports sorting of 1D arrays. Numpy sort does the job but it also consumes memory. Is there any way to sort whole large dataset across some axis with dask to reduce memory footprint?
Asked
Active
Viewed 121 times
1
-
I am trying to apply xarray.apply_ufunc to groupby object with dask parallelization but getting error that I need to specify output_dtypes (which I have specified). Am I doing something wrong or apply ufunc does not support groupby objects? code: `xarray.apply_ufunc(numpy.sort, dataset.groupby('time.year'), 0, dask='parallelized', output_dtypes=[numpy.float64])`. Error: `ValueError: output dtypes (output_dtypes) must be supplied to apply_func when using dask='parallelized'` – wol Oct 25 '20 at 11:40