0

At first I was using a really slow approach:

df.groupby("name")["P"].transform(lambda x: x.rolling(5, min_periods=).mean())

So I started looking into how to speed this up. I read about convolution and learned how to use it. This was the function I developed:

def convo_mean_group(df, attr, window_size, group_attr):
    groups = df.groupby(group_attr)[attr]
    s =[[],[]]
    for k, grp in groups:
        mul = np.ones(window_size)
        c = np.convolve(grp, mul)
        c = c[:len(grp)]
        divisor = np.ones(len(grp)) * window_size
        divisor[:min(len(grp),window_size)] = np.arange(1,min(len(grp),window_size)+1)
        s[0].extend(grp.index)
        s[1].extend(c/divisor)
    return_series = pd.Series(s[1], index=s[0])
    return_series.sort_index(inplace=True)
    return return_series

Compared to the code that uses transform it is significantly faster. However I found later that a simple solution is available, that is even marginally faster.

df.groupby("name", sort=False)["P"].rolling(5, min_periods=1).mean()

I still think the approach with the convolution can be improved, as calculating the rolling average is much faster with the convolution if there is not need to groupby.

So, I guess the pandas groupby and the iteration over the groups slows it down. Is there a faster way to do this.

Here is a data-sample, if it helps.

                name      P
10       Leg It Liam   3.75
11       Leg It Liam  10.00
12       Leg It Liam   7.00
13  Hollyhill Island   6.00
14  Hollyhill Island   4.50
15  Hollyhill Island   3.50
16                Ab   2.50
17  Hollyhill Island   2.38
18                Ab   3.50
19  Hollyhill Island   3.75
Borut Flis
  • 15,715
  • 30
  • 92
  • 119
  • 1
    Convoluting with a constant array is unnecessary. The change in a rolling mean at index i is value[i]/w_len - value[i-w_len]/w_len which can be vectorized quite easlily and summed to get the rolling mean. – kubatucka Oct 05 '21 at 09:52
  • 1
    However the most time consuming part would still be the groupby. Consider using categorical variables. – kubatucka Oct 05 '21 at 09:55

0 Answers0