I need to implement moving average in my own way: the input includes only samples from not zero values, but the output should be calculated for each time tick, also for the empty ones, those that are not in the input.
Code example:
time_step = 120
window_size = time_step * 30
ma_array = []
def my_rolling_mean():
window_start_iter = extent_df.itertuples()
window_end_iter = extent_df.itertuples()
window_start_tuple = window_start_iter.next()
window_end_tuple = None
next_window_end_tuple = window_end_iter.next()
rolling_sum = 0
for t_i_start in xrange(start_log_time, end_log_time - window_size, time_step):
t_i_end = t_i_start + window_size
while window_start_tuple[0][0] < t_i_start: # time
rolling_sum -= real_start_tuple[1] # value
window_start_tuple = df_start_iter.next()
while next_window_end_tuple[0][0] < t_i_end:
window_end_tuple = next_window_end_tuple
next_window_end_tuple = window_end_iter.next()
rolling_sum += window_end_tuple[1]
ma_i = float(rolling_sum) / ((t_i_end - t_i_start) / time_step)
ma_array.append(ma_i)
The time performance of *pandas.rolling_mean* 100 better than the *my_rolling_mean*:
In [342]: extent_df[:10]
Out[342]:
TOTAL_RR
TIME EXTENT
120 0 10
240 0 20
360 0 30
480 0 40
600 0 50
720 0 60
840 0 87
960 0 87
1080 0 87
1200 0 87
In [343]: len(extent_df)
Out[343]: 9110
In [344]: %timeit my_rolling_mean()
10 loops, best of 3: 26.3 ms per loop
In [345]: %timeit pd.rolling_mean(extent_df, 3600)
1000 loops, best of 3: 232 µs per loop
Please advise how to improve performance.
Thank you in advance,
Slava