Speed up finding the average of top 5 numbers from a rolling window in python

Question

I would like to create a column of the average of the 5 highest values from a rolling window of 30. Using a for loop is very slow with a large DataFrame. I tried using rolling() with nlargest(), but it wouldn't work. Any suggestions for speeding this up?

def top_values(df, column, days):
    top5 = df.nlargest(days, column)
    top = top5[column].sum() / days

x = 0
w = 0
for i in df.index:
    if x > 30:
        df['tops'][x] = top_values(df[w:x], 'column', 5)
        w += 1
        x += 1

score 4 · Accepted Answer · answered Jul 20 '18 at 15:20

One method would be to use a lambda function in your rolling, and for instance get the mean of the first 5 elements of a sorted list:

df['column'].rolling(30).apply(lambda x: np.mean(sorted(x,reverse=True)[:5]))

Minimal Example:

On a dataframe of 15 elements, we can get the mean value of the top 3 values in a window of 5 to demonstrate:

>>> df
    column
0       48
1        9
2       36
3       71
4       59
5       16
6        9
7       18
8       43
9        3
10      54
11      23
12      12
13      38
14      54

>>> df['column'].rolling(5).apply(lambda x: np.mean(sorted(x,reverse=True)[:3]))
0           NaN
1           NaN
2           NaN
3           NaN
4     59.333333
5     55.333333
6     55.333333
7     49.333333
8     40.000000
9     25.666667
10    38.333333
11    40.000000
12    40.000000
13    38.333333
14    48.666667
Name: column, dtype: float64

Awesome. Greatly improves the speed. Thank you. – slard Jul 20 '18 at 16:13 — slard, Jul 20 '18 at 16:13

Speed up finding the average of top 5 numbers from a rolling window in python

1 Answers1