At first I was using a really slow approach:
df.groupby("name")["P"].transform(lambda x: x.rolling(5, min_periods=).mean())
So I started looking into how to speed this up. I read about convolution and learned how to use it. This was the function I developed:
def convo_mean_group(df, attr, window_size, group_attr):
groups = df.groupby(group_attr)[attr]
s =[[],[]]
for k, grp in groups:
mul = np.ones(window_size)
c = np.convolve(grp, mul)
c = c[:len(grp)]
divisor = np.ones(len(grp)) * window_size
divisor[:min(len(grp),window_size)] = np.arange(1,min(len(grp),window_size)+1)
s[0].extend(grp.index)
s[1].extend(c/divisor)
return_series = pd.Series(s[1], index=s[0])
return_series.sort_index(inplace=True)
return return_series
Compared to the code that uses transform it is significantly faster. However I found later that a simple solution is available, that is even marginally faster.
df.groupby("name", sort=False)["P"].rolling(5, min_periods=1).mean()
I still think the approach with the convolution can be improved, as calculating the rolling average is much faster with the convolution if there is not need to groupby.
So, I guess the pandas groupby and the iteration over the groups slows it down. Is there a faster way to do this.
Here is a data-sample, if it helps.
name P
10 Leg It Liam 3.75
11 Leg It Liam 10.00
12 Leg It Liam 7.00
13 Hollyhill Island 6.00
14 Hollyhill Island 4.50
15 Hollyhill Island 3.50
16 Ab 2.50
17 Hollyhill Island 2.38
18 Ab 3.50
19 Hollyhill Island 3.75