0

I need to implement moving average in my own way: the input includes only samples from not zero values, but the output should be calculated for each time tick, also for the empty ones, those that are not in the input.

Code example:

time_step = 120    
window_size = time_step * 30
ma_array = []

def my_rolling_mean():
    window_start_iter = extent_df.itertuples()
    window_end_iter = extent_df.itertuples()
    window_start_tuple = window_start_iter.next()
    window_end_tuple = None
    next_window_end_tuple = window_end_iter.next()
    rolling_sum = 0

    for t_i_start in xrange(start_log_time, end_log_time - window_size, time_step):
        t_i_end = t_i_start + window_size

        while window_start_tuple[0][0] < t_i_start:  # time
            rolling_sum -= real_start_tuple[1]  # value
            window_start_tuple = df_start_iter.next()

        while next_window_end_tuple[0][0] < t_i_end:
            window_end_tuple = next_window_end_tuple
            next_window_end_tuple = window_end_iter.next()
            rolling_sum += window_end_tuple[1]

        ma_i = float(rolling_sum) / ((t_i_end - t_i_start) / time_step)
        ma_array.append(ma_i)

The time performance of *pandas.rolling_mean* 100 better than the *my_rolling_mean*:

In [342]: extent_df[:10]
Out[342]: 
             TOTAL_RR
TIME EXTENT          
120  0             10
240  0             20
360  0             30
480  0             40
600  0             50
720  0             60
840  0             87
960  0             87
1080 0             87
1200 0             87

In [343]: len(extent_df)
Out[343]: 9110

In [344]: %timeit my_rolling_mean()
10 loops, best of 3: 26.3 ms per loop

In [345]: %timeit pd.rolling_mean(extent_df, 3600)
1000 loops, best of 3: 232 µs per loop

Please advise how to improve performance.

Thank you in advance,
Slava

Vyacheslav Shkolyar
  • 1,926
  • 2
  • 14
  • 12
  • In pandas these functions are implemented in cython, which can explain the performance difference. – joris Aug 04 '13 at 19:06
  • @joris, yes, I seen, but still, more than 100 times?! – Vyacheslav Shkolyar Aug 04 '13 at 19:16
  • 100 times is not implausible: http://pandas.pydata.org/pandas-docs/dev/enhancingperf.html#cython-writing-c-extensions-for-pandas – Andy Hayden Aug 04 '13 at 19:27
  • Also you can speed it up by extracting just the calculation part and using `rolling_apply` that takes an extra func argument and performs generic rolling computations. [moving-rolling-statistics-moments](http://pandas.pydata.org/pandas-docs/dev/computation.html#moving-rolling-statistics-moments) – Viktor Kerkez Aug 04 '13 at 22:51
  • Can you add some detail about what's your data looks like, and what you want to calculate about your data, and the result you want. – HYRY Aug 05 '13 at 00:53

0 Answers0