I have a multi-index based on dates and a category, which looks like this:
D
Date G
2004-01-01 A 1
A 2
A 3
A 4
B 101
B 102
B 103
C 201
C 202
C 203
C 204
C 205
2004-01-02 A 5
A 6
A 7
A 8
B 104
B 105
B 106
C 206
C 207
C 208
C 209
C 210
...
Now I want to apply a rolling window to aggregate my data, but group the aggregation by the second level index. With a function like min() and a timewindow of three days (no minimal periods), this would result in something like this:
D
Date G
2004-01-01 A 1
B 101
C 201
2004-01-02 A 1
B 101
C 201
2004-01-03 A 1
B 101
C 201
2004-01-04 A 5
B 104
C 206
2004-01-05 A 9
B 107
C 211
...
I have already tried the resampling method, but I couldn't get overlapping timewindows with this method.
This is what I already have:
# result dataframe with initiated index
df_agg = pd.DataFrame(index=df.index.unique(), columns=['D'])
# get all unique categories in df
groups = df.reset_index(level='G')['G'].unique()
# iterate over categories in G
for group in groups:
df_group = df.iloc[df.index.get_level_values('G')==group]
df_group_agg = pd.DataFrame()
df_group_agg['D'] = df_group.reset_index(level='G')\
.groupby('G')['D'].rolling('3d').apply(np.min)
# swap index level because of groupby
df_group_agg = df_group_agg.swaplevel(0,1)
# write data into correct place
df_agg.loc[df.index.get_level_values('G')==group, 'D'] = df_group_agg['D']
But this leaves me with something like this:
D
Date G
2004-01-01 A 1
A 1
A 1
A 1
B 101
B 101
B 101
C 201
C 201
C 201
C 201
C 201
2004-01-02 A 1
A 1
A 1
A 1
B 101
B 101
B 101
C 201
C 201
C 201
C 201
C 211
...
I could try to remove all the unnecessary rows, but this seems inefficient to me, as the aggregation function has to be calculated much more often than needed.
Is there are more elegant way to achieve my goal? Do I have to iterate manually?
Edit:
Here is some code to generate the described example:
from datetime import datetime
import numpy as np
import pandas as pd
dates = pd.date_range(start=datetime(2004, 1, 1),
end=datetime(2004, 3, 1),
freq='D')
df = pd.DataFrame(data={'Date': np.repeat(dates,4+3+5),
'G': (['A']*4+['B']*3+['C']*5)*len(dates)} )
df['D'] = 0
df.loc[df['G']=='A','D'] = range(1,len(df.loc[df['G']=='A','D'])+1)
df.loc[df['G']=='B','D'] = range(101,len(df.loc[df['G']=='B','D'])+101)
df.loc[df['G']=='C','D'] = range(201,len(df.loc[df['G']=='C','D'])+201)
df = df.set_index(['Date','G'])