17

I have data sampled at essentially random intervals. I would like to compute a weighted moving average using numpy (or other python package). I have a crude implementation of a moving average, but I am having trouble finding a good way to do a weighted moving average, so that the values towards the center of the bin are weighted more than values towards the edges.

Here I generate some sample data and then take a moving average. How can I most easily implement a weighted moving average? Thanks!

import numpy as np
import matplotlib.pyplot as plt

#first generate some datapoint for a randomly sampled noisy sinewave
x = np.random.random(1000)*10
noise = np.random.normal(scale=0.3,size=len(x))
y = np.sin(x) + noise

#plot the data
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data')
plt.xlabel('Time')
plt.ylabel('Intensity')

#define a moving average function
def moving_average(x,y,step_size=.1,bin_size=1):
    bin_centers  = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
    bin_avg = np.zeros(len(bin_centers))

    for index in range(0,len(bin_centers)):
        bin_center = bin_centers[index]
        items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))]
        bin_avg[index] = np.mean(items_in_bin)

    return bin_centers,bin_avg

#plot the moving average
bins, average = moving_average(x,y)
plt.plot(bins, average,label='moving average')

plt.show()

The output: Data and moving average

Using the advice from crs17 to use "weights=" in the np.average function, I came up weighted average function, which uses a Gaussian function to weight the data:

def weighted_moving_average(x,y,step_size=0.05,width=1):
    bin_centers  = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
    bin_avg = np.zeros(len(bin_centers))

    #We're going to weight with a Gaussian function
    def gaussian(x,amp=1,mean=0,sigma=1):
        return amp*np.exp(-(x-mean)**2/(2*sigma**2))

    for index in range(0,len(bin_centers)):
        bin_center = bin_centers[index]
        weights = gaussian(x,mean=bin_center,sigma=width)
        bin_avg[index] = np.average(y,weights=weights)

    return (bin_centers,bin_avg)

Results look good: Working weighted average using numpy

DanHickstein
  • 6,588
  • 13
  • 54
  • 90
  • Try searching for information relating to weights for a digital low pass filter. – Adam Burry Aug 29 '13 at 17:56
  • 2
    You already have [exponentially weighted moment functions](http://pandas.pydata.org/pandas-docs/dev/computation.html#exponentially-weighted-moment-functions) implemented in pandas. – Viktor Kerkez Aug 29 '13 at 17:58

2 Answers2

8

You could use numpy.average which allows you to specify weights:

>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights)

So to calculate the weights you could find the x coordinates of each data point in the bin and calculate their distances to the bin center.

crs17
  • 541
  • 2
  • 6
  • Yes! I had no idea about this average function and how it could be weighted! I posted my complete solution at the bottom of my question. – DanHickstein Aug 29 '13 at 21:06
4

This won't give an exact solution, but it will make your life easier, and will probably be good enough... First, average your samples in small bins. Once you have resampled your data to be equispaced, you can use stride tricks and np.average to do a weighted average:

from numpy.lib.stride_tricks import as_strided

def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10,
                            weights=None):
    # This ensures that all samples are within a bin
    number_of_bins = int(np.ceil(np.ptp(x) / step_size))
    bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins,
                       num=number_of_bins+1)
    bins -= (bins[-1] - np.max(x)) / 2
    bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2

    counts, _ = np.histogram(x, bins=bins)
    vals, _ = np.histogram(x, bins=bins, weights=y)
    bin_avgs = vals / counts
    n = len(bin_avgs)
    windowed_bin_avgs = as_strided(bin_avgs,
                                   (n-steps_per_bin+1, steps_per_bin),
                                   bin_avgs.strides*2)

    weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights)

    return bin_centers, weighted_average

You can now do something like this:

#plot the moving average with triangular weights
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1]))
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights),
                                        weights=weights)
plt.plot(bins, average,label='moving average')

plt.show()

enter image description here

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Thanks for the solution! This also looks like it will work, but I find the "weights" method to be a little more intuitive. – DanHickstein Aug 29 '13 at 21:10
  • @DanHickstein It seems like what you have coded would be awfully slow for even moderately large datasets, but you are the only one who can decide if it's fast enough for you. – Jaime Aug 29 '13 at 21:21
  • Ah, good point! I had not checked on the speed - only got it working for the demonstration. – DanHickstein Aug 29 '13 at 21:46
  • I started using much larger datasets and this method is super-fast. I love it. It seems like something like this should be implemented in a python package. The rolling_mean and ewma functions in pandas are not meant for randomly spaced x-values, so they are not really appropriate. Anyway, thanks again for this solution. – DanHickstein Oct 15 '13 at 16:38