1

I am trying to create a new column in an existing df. The values of the new column are created by a combination of the groupby and rolling sum. How do I do this?

I've tried two approaches both resulting in either NaN values or 'incompatible index of the inserted column with frame index'

df = something like this:


    HomeTeam    FTHP
0   Bristol Rvs 0
1   Crewe           0
2   Hartlepool  3
3   Huddersfield    1

and I've tried:

(1)

df['new'] = df.groupby('HomeTeam')['FTHP'].rolling(4).sum()

(2)

df['new'] = df.groupby('HomeTeam').FTHP.apply(lambda x: x.rolling(4).mean())

(1) outputs the following which are the values that I would like to add in a new column.

HomeTeam        
Brighton     12      NaN
             36      NaN
             49      NaN
             72      2.0
             99      2.0

And I am trying to add these values in a new columns next to the appropriate HomeTeam. Resulting in a NaN for the first three (as it is rolling(4)) and pick up values after, something like:


    HomeTeam    FTHP      RollingMean
0   Bristol Rvs 0         NaN
1   Crewe           0         NaN
2   Hartlepool  3         NaN
3   Huddersfield    1         NaN
INDRAJITH EKANAYAKE
  • 3,894
  • 11
  • 41
  • 63

1 Answers1

0

To ensure alignment on the original (non-duplicated) index:

df.groupby('HomeTeam', as_index=False)['FTHP'].rolling(4).sum().reset_index(0, drop=True)

With a df:

  HomeTeam  FTHP
A        a     0
B        b     1
C        b     2
D        a     3
E        b     4

grouping with as_index=False adds an ngroup value as the 0th level, preserving the original index in the 1st level:

df.groupby('HomeTeam', as_index=False)['FTHP'].rolling(2).sum()
#0  A    NaN
#   D    3.0
#1  B    NaN
#   C    3.0
#   E    6.0
#Name: FTHP, dtype: float64

Drop level=0 to ensure alignment on the original index. Your original index should not be duplicated, else you get a ValueError.

ALollz
  • 57,915
  • 7
  • 66
  • 89