Pandas groupby multiple columns with rolling date offset - How?

Question

I am trying to do a rolling sum across partitioned data based on a moving 2 business day window. It feels like it should be both easy and widely used, but the solution is beyond me.

#generate sample data
import pandas as pd
import numpy as np
import datetime
vals = [-4,17,-4,-16,2,20,3,10,-17,-8,-21,2,0,-11,16,-24,-10,-21,5,12,14,9,-15,-15]
grp = ['X']*6 + ['Y'] * 6 + ['X']*6 + ['Y'] * 6
typ = ['foo']*12+['bar']*12
dat = ['19/01/18','19/01/18','22/01/18','22/01/18','23/01/18','24/01/18'] * 4
#create dataframe with sample data
df = pd.DataFrame({'group': grp,'type':typ,'value':vals,'date':dat})
df.date = pd.to_datetime(df.date)
df.head(12)

gives the following (note this is just the head 12 rows):

    date    group   type    value
0   19/01/2018  X   foo     -4
1   19/01/2018  X   foo     17
2   22/01/2018  X   foo     -4
3   22/01/2018  X   foo     -16
4   23/01/2018  X   foo     2
5   24/01/2018  X   foo     20
6   19/01/2018  Y   foo     3
7   19/01/2018  Y   foo     10
8   22/01/2018  Y   foo     -17
9   22/01/2018  Y   foo     -8
10  23/01/2018  Y   foo     -21
11  24/01/2018  Y   foo     2

The desired results are (all rows shown here):

    date    group   type    2BD Sum
1   19/01/2018  X   foo     13
2   22/01/2018  X   foo     -7
3   23/01/2018  X   foo     -18
4   24/01/2018  X   foo     22
5   19/01/2018  Y   foo     13
6   22/01/2018  Y   foo     -12
7   23/01/2018  Y   foo     -46
8   24/01/2018  Y   foo     -19
9   19/01/2018  X   bar     -11
10  22/01/2018  X   bar     -19
11  23/01/2018  X   bar     -18
12  24/01/2018  X   bar     -31
13  19/01/2018  Y   bar     17
14  22/01/2018  Y   bar     40
15  23/01/2018  Y   bar     8
16  24/01/2018  Y   bar     -30

I have viewed this question and tried

df.groupby(['group','type']).rolling('2d',on='date').agg({'value':'sum'}
).reset_index().groupby(['group','type','date']).agg({'value':'sum'}).reset_index()

Which would work fine if 'value' is always positive, but this is not the case here. I have tried many other ways that have caused errors that I can list if it is of value. Can anyone help?

it's the sum of the first 4 rows. - current business day + previous business day — Bonners, May 18 '18 at 14:52
Hm, so the logic is not quite clear to me. Aren't you trying to do a 2-day rolling sum? Edit: Now I think I understand part of it. So, in your first line you want `13`, which is the sum of only one business day, is that correct? — rafaelc, May 18 '18 at 14:53
That's exactly right. Sorry if my explanation is not as clear as it could be. I have got too close to the detail. Essentially I want something like pyspark.sql.window partitionBy().orderBy().rangeBetween() using dates. — Bonners, May 18 '18 at 15:11
Very similar to this question: https://stackoverflow.com/questions/50702986/pandas-rolling-function-with-monthly-offset/66057187#66057187 — Sid Kwakkel, Feb 05 '21 at 04:54

Mrml91 · Answer 1 · 2020-03-20T21:50:02.983

I expected the following to work:

g = lambda ts: ts.rolling('2B', on='date')['value'].sum()
df.groupby(['group', 'type']).apply(g)

However, I get an error as a business day is not a fixed frequency.
This brings me to suggesting the following solution, a lot uglier:

value_per_bday = lambda df: df.resample('B', on='date')['value'].sum()
df = df.groupby(['group', 'type']).apply(value_per_bday).stack()
value_2_bdays = lambda x: x.rolling(2, min_periods=1).sum()
df = df.groupby(axis=0, level=['group', 'type']).apply(value_2_bdays)

Maybe it sounds better with a function, your pick.

def resample_and_sum(x):
    x = x.resample('B', on='date')['value'].sum()
    x = x.rolling(2, min_periods=1).sum()
    return x

df = df.groupby(['group', 'type']).apply(resample_and_sum).stack()

Pandas groupby multiple columns with rolling date offset - How?

1 Answers1