4

I'm trying to take an array and resample it with a custom function. From this post: Apply function along time dimension of XArray

def special_mean(x, drop_min=False):
    s = np.sum(x)
    n = len(x)
    if drop_min:
        s = s - x.min()
    n -= 1
    return s/n

is an example sample_mean.

I have a dataset that is:

<xarray.Dataset>
Dimensions:  (lat: 100, lon: 130, time: 7305)
Coordinates:
  * lon      (lon) float32 -99.375 -99.291664 -99.208336 ... -88.708336 -88.625
  * lat      (lat) float32 49.78038 49.696426 49.61247 ... 41.552795 41.46884
    lev      float32 1.0
  * time     (time) datetime64[ns] 2040-01-01 2040-01-02 ... 2059-12-31
Data variables:
    tmin     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    tmax     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    prec     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    relh     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    wspd     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    rads     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
Attributes:
    history:  Fri Jun 14 10:32:22 2019: ncatted -a _FillValue,,o,d,9e+20 IBIS...

And then I apply a resample that is:

data.resample(time='1MS').map(special_mean)


<xarray.Dataset>
Dimensions:  (time: 240)
Coordinates:
  * time     (time) datetime64[ns] 2040-01-01 2040-02-01 ... 2059-12-01
    lev      float32 1.0
Data variables:
    tmin     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    tmax     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    prec     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    relh     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    wspd     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    rads     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>

How do I do this function such that I can retain the 'lon' and 'lat' coordinates like when doing

data.resample(time='1MS').mean()
blueduckyy
  • 201
  • 1
  • 11

2 Answers2

3

Here's one example of how you can use xr.apply_ufunc().

import xarray as xr
data = xr.tutorial.open_dataset('air_temperature')

def special_mean(x, drop_min=False):
    s = np.sum(x)
    n = len(x)
    if drop_min:
        s = s - x.min()
    n -= 1
    return s/n

def special_func(data):
    return xr.apply_ufunc(special_mean, data, input_core_dims=[["time"]], 
            kwargs={'drop_min': True}, dask = 'allowed', vectorize = True)

data.resample(time='1MS').apply(special_func)

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 24)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2013-02-01 ... 2014-12-01
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (time, lat, lon) float64 244.6 244.7 244.7 ... 297.7 297.7 297.7
bwc
  • 1,028
  • 7
  • 18
  • As shown in the documentation, vectorize=True is very slow for my size dataset. Do you have any idea how I can make it a little bit quicker? – blueduckyy Feb 25 '20 at 01:49
  • Also, is there a way to pass in variables to special_func that are out of scope. The use case would be drop_min only 1 versus drop_min twice or something like that. – blueduckyy Feb 25 '20 at 02:44
  • @blueduckyy try converting your data into a dask array using `.chunk()` and then switch the argument to `dask='parallelized'` :) It'll allow the ufunc to operate lazily on your data using dask, you can then load the data into memory later on using `da.compute()`. Have a look at my other answer, see if it helps. :) https://stackoverflow.com/questions/38960903/applying-numpy-polyfit-to-xarray-dataset/60517358#60517358 – Andrew Williams Mar 06 '20 at 21:13
-1

I suspect that you can do what you want with the apply_ufunc method.

(although as a disclaimer I do not know the Xarray API well.)

MRocklin
  • 55,641
  • 23
  • 163
  • 235