Pandas: complex resampling and groupby with Multiindex

Question

I have the following DataFrame:

df = pd.DataFrame({
'Trader': 'Carl Mark Carl Joe Joe Carl Joe Carl'.split(),
'Product': list('AAAABBAA'),
'Quantity': [5,2,5,10,1,5,2,3],
'Start' : [
    DT.datetime(2013,1,1,9,0),
    DT.datetime(2013,1,1,8,5),
    DT.datetime(2013,2,5,14,0),
    DT.datetime(2013,2,5,16,0),
    DT.datetime(2013,2,8,20,0),                                      
    DT.datetime(2013,2,8,16,50),
    DT.datetime(2013,2,8,7,0),
    DT.datetime(2013,7,4,8,0)]})

df = df.set_index(['Start', 'Trader', 'Product'])

I would like to resample this DataFrame on a daily basis and compute the following complex statistics.

For each product: What is the weekly average difference between between the EMA Value (the day before) and Traders accutal purchases
For each product: What is the weekly average variance of Traders purchasing dates

As a first step therefore I am unstaking the Product and Trader columns and calculate the ema using:

df1 = df.unstack(level=[1,2]).fillna(0).resample('D', 'mean').fillna(0)
df2 = pd.ewma(df1, span=7)

Update:

Thanks to the comments by Wouter Overmeire I could combine the shifted trend values with the original values using:

df2 = df2.shift(freq='D')
df3 = pd.concat([df1, df2], keys=['acctual', 'trend'], axis=1)
df4 = df3.stack(['Trader', 'Product']).fillna(0)

Based on this DateFrame I try to answer the two questions outlined above using a group by function. Unfortunately, I cannot abstract from the trader column as it is still in the result of the group by function. Does anybody has an idea?

df4 = df4.reset_index()
def f (df):
    df['Diff'] = abs(df[('acctual', 'Quantity')] - df[('trend', 'Quantity')])
    return df


df5 = df4.groupby([df4['Start'].map(lambda x: x.year), df4['Start'].map(lambda x: x.week),    df4['Product']])
df5.apply(f)

I would deeply appreciate any help.

Thanks

Andy

What went wrong with concat propsosed in the link above? e.g. pd.concat([df1, df2], keys=['AVG-DIFF', 'AVG-VAR'], axis=1) works fine. — Wouter Overmeire, May 27 '13 at 12:17
Hi Wouter, thanks for your help. The problem with the concat method is that it does not work for me when I try to unstack the Trader column. I updated my original description to be more precise. — Andy, May 27 '13 at 12:53
*Trader* is a column level, if you want to pivot this you need to use stack and not unstack (which if for pivoting a level on the index). — Wouter Overmeire, May 27 '13 at 13:09
Thanks, that solves the first issue. But how do I shift the trend column by 1 period? — Andy, May 27 '13 at 13:35
The *trend* columns of df3 is df2. So you can do df2.shift(freq='D').stack('Trader'). I do not know if it is possible to shift on a DateTime level of a MultiIndex (operating directly on df4). Not really a problem here, since data on a single level DatatimeIndex is available in df2. — Wouter Overmeire, May 27 '13 at 14:11
Hi Wouter, thanks for the help. I posted an update to my original question. I really appreciate if you could have a look at it. — Andy, May 27 '13 at 16:05
To access a MultiIndex column a tuple is needed. So in function f: df['Diff'] = abs(df[('acctual', 'Quantity')] - df[('trend', 'Quantity')]). Also you need to return the adapted frame (add return df at the end of the function). — Wouter Overmeire, May 28 '13 at 06:54
Hi Wouter, thanks again for your help. I worked as you said. However, I still have a problem with the groupby function as it still includes the Trader column although I did not specify it that way. I would really appreciate your help — Andy, May 28 '13 at 07:35
You do not need groupby/apply to compute the diff column. df4['Diff'] = abs(df4[('acctual', 'Quantity')] - df4[('trend', 'Quantity')]). From this aggregated data can be computed with groupby e.g df4.groupby([df4['Start'].map(lambda x: x.year), df4['Start'].map(lambda x: x.week), df4['Product']]).sum(). Not that the latter does not have a Trader column. Are you sure that the groupby you are doing is what you want? Some funky mapping Trader <-> week. — Wouter Overmeire, May 28 '13 at 11:31

Pandas: complex resampling and groupby with Multiindex

Update:

0 Answers0