0

Xarray can do weighted rolling mean via the .construct() object as stated in answer on SO here and also in the docs.

The weighted rolling mean example in the docs doesn't quite look right as it seems to give the same answer as the ordinary rolling mean.

import xarray as xr
import numpy as np

arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5),
...                dims=('x', 'y'))
arr.rolling(y=3, center=True).mean()
#<xarray.DataArray (x: 3, y: 5)>
#array([[nan, 0.5, 1. , 1.5, nan],
#       [nan, 3. , 3.5, 4. , nan],
#       [nan, 5.5, 6. , 6.5, nan]])
#Dimensions without coordinates: x, y

weight = xr.DataArray([0.25, 0.5, 0.25], dims=['window'])
arr.rolling(y=3, center=True).construct('window').dot(weight)
#<xarray.DataArray (x: 3, y: 5)>
#array([[nan, 0.5, 1. , 1.5, nan],
#       [nan, 3. , 3.5, 4. , nan],
#       [nan, 5.5, 6. , 6.5, nan]])
#Dimensions without coordinates: x, y

Here is a more simple example which I would like to get the syntax right on:

da = xr.DataArray(np.arange(1,6), dims='x')
da.rolling(x=3, center=True).mean()
#<xarray.DataArray (x: 5)>
#array([nan,  2.,  3.,  4., nan])
#Dimensions without coordinates: x

weight = xr.DataArray([0.5, 1, 0.5], dims=['window'])
da.rolling(x=3, center=True).construct('window').dot(weight)
#<xarray.DataArray (x: 5)>
#array([nan,  4.,  6.,  8., nan])
#Dimensions without coordinates: x

It returns 4, 6, 8. I thought it would do:

(1 x 0.5) + (2 x 1) + (3 x 0.5) / 3 = 4/3
(2 x 0.5) + (3 x 1) + (4 x 0.5) / 3 = 2
(3 x 0.5) + (4 x 1) + (5 x 0.5) / 3 = 8/3
1.33, 2. 2.66
Ray Bell
  • 1,508
  • 4
  • 18
  • 45

1 Answers1

1

In the first example, you use evenly spaced data for arr. Therefore, the weighted mean (with [0.25, 5, 0.25]) will be the same as the simple mean.

If you consider non-linear data, the result differs

In [50]: arr = xr.DataArray((np.arange(0, 7.5, 0.5)**2).reshape(3, 5),
    ...:                    dims=('x', 'y'))
    ...:                    

In [51]: arr.rolling(y=3, center=True).mean()
Out[51]: 
<xarray.DataArray (x: 3, y: 5)>
array([[      nan,  0.416667,  1.166667,  2.416667,       nan],
       [      nan,  9.166667, 12.416667, 16.166667,       nan],
       [      nan, 30.416667, 36.166667, 42.416667,       nan]])
Dimensions without coordinates: x, y

In [52]: weight = xr.DataArray([0.25, 0.5, 0.25], dims=['window'])
    ...: arr.rolling(y=3, center=True).construct('window').dot(weight)
    ...: 
Out[52]: 
<xarray.DataArray (x: 3, y: 5)>
array([[   nan,  0.375,  1.125,  2.375,    nan],
       [   nan,  9.125, 12.375, 16.125,    nan],
       [   nan, 30.375, 36.125, 42.375,    nan]])
Dimensions without coordinates: x, y

For the second example, you use [0.5, 1, 0.5] as weight, the total of which is 2. Therefore, the first non-nan item will be (1 x 0.5) + (2 x 1) + (3 x 0.5) = 4

If you want weighted mean, rather than the weighted sum, use [0.25, 0.5, 0.25] instead.

Keisuke FUJII
  • 1,306
  • 9
  • 13
  • So the sum of the weight is important. If I use a weight of [0.25, 0.5, 0.25] I go back to your original point that for evenly spaced data it is the same as the simple `mean` `weight = xr.DataArray([0.25, 0.5, 0.25], dims=['window'])` `da.rolling(x=3, center=True).construct('window').dot(weight)` gives 2,3,4. I think what I want is do the `sum` then divide by the length of the window `weight = xr.DataArray([0.5, 1, 0.5], dims=['window']` `da.rolling(x=3, center=True).construct('window').dot(weight)/3` gives 1.33, 2, 2.66 – Ray Bell May 25 '18 at 13:50
  • 1
    Actually, the word `weighted rolling mean` in doc is not quite accurate. It is `weighted rolling sum`. If the weight is normalized, it will be mean. `da.rolling(x=3, center=True).construct('window')` just returns a DataArray, so you can observe what is going on if `da.rolling(x=3, center=True).construct('window').dot(weight)` is called. – Keisuke FUJII May 26 '18 at 13:09