Recombine groupby rollling sum with original pandas DataFrame

Question

I have a pandas DataFrame of the form:

import pandas as pd

df = pd.DataFrame({
    'a': [1,2,3,4,5,6],
    'b': [0,1,0,1,0,1]
})

I want to group the data by the value of 'b' and add new column 'c' which contains a rolling sum of 'a' for each group, then I want to recombine all the groups back into an ungrouped DataFrame which contains the 'c' column. I have got as far as:

for i, group in df.groupby('b'):
    group['c'] = group.a.rolling(
        window=2,
        min_periods=1,
        center=False
    ).sum()

But there are several problems with this approach:

Operating on each group using a for loop feels like it is going to be slow for a large DataFrame (like my actual data)
I can't find an elegant way to save column 'c' for each group and add it back to the original DataFrame. I could append c for each group to an array, zip it with an analagous index array, etc. but that seems very hacky. Is there a built-in pandas method that I am missing here?

Nickil Maveli · Accepted Answer · 2016-11-29T06:28:23.373

1

If using groupby is a must then, you could go with groupby.apply to compute all in one go instead:

df['c'] = df.groupby('b')['a'].apply(lambda x: x.rolling(2, min_periods=1).sum())

Starting with v0.19.1, you can directly call rolling()/expanding() methods on groupby objects as shown:

df['c'] = df.groupby('b').rolling(2, min_periods=1)['a'].sum().sortlevel(1).values

Both giving you :-

df

edited Nov 29 '16 at 06:28

answered Nov 28 '16 at 13:51

Nickil Maveli

29,155
8
82
85

1

you can do this directly: FYI http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#groupby-syntax-with-window-and-resample-operations (though I realize not documented except in the whatsnew) – Jeff Nov 28 '16 at 23:54
and if anyone wants to enhance the docs: https://github.com/pandas-dev/pandas/issues/14759 – Jeff Nov 28 '16 at 23:59

Recombine groupby rollling sum with original pandas DataFrame

1 Answers1