1

Similar to this answer, I can calculate multiple rolling means

d1 = df.set_index('DateTime').sort_index()
ma_1h = d1.groupby('Event').rolling('H').mean()
ma_2h = d1.groupby('Event').rolling('2H').mean()

But how can I do this performantly if I want to do it for a list of arrays?

window_array = ['H','3H','6H','9H'] # etc

And that my rolling means are included back into my original dataframe

Adam Jaamour
  • 1,326
  • 1
  • 15
  • 31
philshem
  • 24,761
  • 8
  • 61
  • 127

2 Answers2

1

I believe you need convert offsets and create new DataFrames in loop by list comprehension, last concat:

from pandas.tseries.frequencies import to_offset

df1 = pd.concat([d1.groupby('Event').rolling(to_offset(x)).mean() for x in window_array], 
                axis=1, 
                keys=window_array)

Sample:

rng = pd.date_range('2017-04-03', periods=10, freq='38T')
df = pd.DataFrame({'DateTime': rng, 'a': range(10), 'Event':[4] * 3 + [3] * 3 + [1] * 4})  
print (df)


from pandas.tseries.frequencies import to_offset
window_array = ['H','3H','6H','9H']


d1 = df.set_index('DateTime').sort_index()
a = pd.concat([d1.groupby('Event')['a'].rolling(to_offset(x)).mean() for x in window_array], 
              axis=1, 
              keys=window_array)
print (a)
                             H   3H   6H   9H
Event DateTime                               
1     2017-04-03 03:48:00  6.0  6.0  6.0  6.0
      2017-04-03 04:26:00  6.5  6.5  6.5  6.5
      2017-04-03 05:04:00  7.5  7.0  7.0  7.0
      2017-04-03 05:42:00  8.5  7.5  7.5  7.5
3     2017-04-03 01:54:00  3.0  3.0  3.0  3.0
      2017-04-03 02:32:00  3.5  3.5  3.5  3.5
      2017-04-03 03:10:00  4.5  4.0  4.0  4.0
4     2017-04-03 00:00:00  0.0  0.0  0.0  0.0
      2017-04-03 00:38:00  0.5  0.5  0.5  0.5
      2017-04-03 01:16:00  1.5  1.0  1.0  1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0
window_array = ['H','3H','6H','9H'] # etc
for window in window_array:
    d1[window] = d1.groupby('Event').rolling(window).mean()
jo9k
  • 690
  • 6
  • 19