6

I have a pandas data frame like this;

>df

leg    speed
  1       10
  1       11
  1       12
  1       13
  1       12
  1       15
  1       19
  1       12
  2       10
  2       10
  2       12
  2       15
  2       19
  2       11
  :        :

I want to make a new column roll_speed where it takes a rolling average speed of the last 5 positions. But I wanna put more detailed condition in it.

  1. Groupby leg(it doesn't take into account the speed of the rows in different leg.
  2. I want the rolling window to be changed from 1 to 5 maximum according to the available rows. For example in leg == 1, in the first row there is only one row to calculate, so the rolling speed should be 10/1 = 10. For the second row, there are only two rows available for calculation, the rolling speed should be (10+11)/2 = 10.5.

    leg    speed   roll_speed
      1       10           10    # 10/1
      1       11           10.5  # (10+11)/2
      1       12           11    # (10+11+12)/3
      1       13           11.5  # (10+11+12+13)/4
      1       12           11.6  # (10+11+12+13+12)/5
      1       15           12.6  # (11+12+13+12+15)/5
      1       19           14.2  # (12+13+12+15+19)/5
      1       12           14.2  # (13+12+15+19+12)/5
      2       10           10    # 10/1
      2       10           10    # (10+10)/2
      2       12           10.7  # (10+10+12)/3
      2       15           11.8  # (10+10+12+15)/4
      2       19           13.2  # (10+10+12+15+19)/5
      2       11           13.4  # (10+12+15+19+11)/5
      :        :           
    

My attempt:

df['roll_speed'] = df.speed.rolling(5).mean()

But it just returns NA for rows where less than five rows are available for calculation. How should I solve this problem? Thank you for any help!

cs95
  • 379,657
  • 97
  • 704
  • 746
Makoto Miyazaki
  • 1,743
  • 2
  • 23
  • 39

2 Answers2

7

Set the parameter min_periods to 1

df['roll_speed'] = df.groupby('leg').speed.rolling(5, min_periods = 1).mean()\
.round(1).reset_index(drop = True)

    leg speed   roll_speed
0   1   10  10.0
1   1   11  10.5
2   1   12  11.0
3   1   13  11.5
4   1   12  11.6
5   1   15  12.6
6   1   19  14.2
7   1   12  14.2
8   2   10  10.0
9   2   10  10.0
10  2   12  10.7
11  2   15  11.8
12  2   19  13.2
13  2   11  13.4
Vaishali
  • 37,545
  • 5
  • 58
  • 86
2

Using rolling(5) will get you your results for all but the first 4 occurences of each group. We can fill the remaining values with the expanding mean:

(df.groupby('leg').speed.rolling(5)
    .mean().fillna(df.groupby('leg').speed.expanding().mean())
).reset_index(drop=True)

0     10.000000
1     10.500000
2     11.000000
3     11.500000
4     11.600000
5     12.600000
6     14.200000
7     14.200000
8     10.000000
9     10.000000
10    10.666667
11    11.750000
12    13.200000
13    13.400000
Name: speed, dtype: float64
user3483203
  • 50,081
  • 9
  • 65
  • 94