1

I'm trying to obtain an array containing the moving averages along the rows of a 2-dimensional numpy array, based on a certain 'window' (i.e. the number of rows included in the average) and an 'offset'. I've come up with the code below which I know is not efficient:

import numpy as np
def f(array, window, offset):
    x = np.empty(array.shape)
    x[:,:] = np.NaN
    for row_num in range(array.shape[0]):
        first_row = row_num - window - offset
        last_row = row_num - offset + 1
        if first_row >= 0:
            x[row_num] = np.nanmean(array[first_row:last_row], axis=0)
    return x

I've found a potential solution here, adapted below for my code:

import math
from scipy.ndimage import uniform_filter
def g(array, window, offset):
    return uniform_filter(array, size=(window+1,1), mode='nearest', origin=(math.ceil((window+1)/2-1),0))

This solution, however, has 3 problems:

  • First, I'm not sure how to implement the 'offset'
  • Second, I'm not sure whether it is indeed more efficient
  • Third, and most importantly, it doesn't work when the input array contains np.nan. The moment np.nan is found, it gets dragged down in the moving average, instead of following the np.nanmean behaviour.

Is there an efficient way to achieve what I'm trying to get?

Update

As suggested by Ehsan, I've implemented the code below (with a small modification), which works as my original code for any offset above 0:

from skimage.util import view_as_windows
def h(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:-offset])))

I'm just not sure how to make it work for any offset (in particular, offset=0). Also, this solution seems to consume more time than the original one:

a = np.arange(10*11).reshape(10,11)

%timeit f(a, 5, 2)
%timeit h(a, 5, 2)
>>> 36.6 µs ± 709 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> 67.5 µs ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I was wondering if there's any alternative which is less time consuming

Mike
  • 102
  • 6
  • why do you have +1 in `last_row`? your actual window size with this code is window+1. Is that what you intended to do? – Ehsan Sep 23 '20 at 23:51

1 Answers1

2

This will provide you the same output as your code, but I think you might want to rethink the extra +1 in last_row definition, since it skips the last row and your actual window size would be window+1:

from skimage.util import view_as_windows
def f(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:array.shape[0]-window-offset])))

sample output:

a = np.arange(7*6).reshape(7,6)
f(a, 2, 1)
#[[nan nan nan nan nan nan]
# [nan nan nan nan nan nan]
# [nan nan nan nan nan nan]
# [ 6.  7.  8.  9. 10. 11.]
# [12. 13. 14. 15. 16. 17.]
# [18. 19. 20. 21. 22. 23.]
# [24. 25. 26. 27. 28. 29.]]

Comparison using benchit:

#@OP's solution
def f1(array, window, offset):
    x = np.empty(array.shape)
    x[:,:] = np.NaN
    for row_num in range(array.shape[0]):
        first_row = row_num - window - offset
        last_row = row_num - offset + 1
        if first_row >= 0:
            x[row_num] = np.nanmean(array[first_row:last_row], axis=0)
    return x
#@Ehsan's solution
def f2(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:array.shape[0]-window-offset])))

in_ = {n:[np.arange(n*10).reshape(n,10), 2,2] for n in [10,100,500,1000,4000]}

The proposed solution f2 is significantly faster. You have to note that most vectorized solutions are efficient on larger arrays.

enter image description here

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • That solution works perfectly for offset=1 but with other values for the offset, the output array gets a different size than the input array. I found that using [:-offset] instead of [:-1] at the end solves that problem for any offset different than 0 - still haven't found a way to make it work even with offset=0 – Mike Sep 24 '20 at 08:30
  • Also @Ehsan, this seems to be less efficient from a time consumption standpoint: `a = np.arange(10*11).reshape(10,11)` `%timeit f(a, 5, 2)` `%timeit h(a, 5, 2)` `>>> 36.6 µs ± 709 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)` `>>> 67.5 µs ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)` – Mike Sep 24 '20 at 09:03
  • @Mike Thanks for catching that mistake. Please find the edit on my post to fix the error and now it works on any offset (including 0) – Ehsan Sep 24 '20 at 23:22