I have the following DataFrame:
df = pd.DataFrame({
'Trader': 'Carl Mark Carl Joe Joe Carl Joe Carl'.split(),
'Product': list('AAAABBAA'),
'Quantity': [5,2,5,10,1,5,2,3],
'Start' : [
DT.datetime(2013,1,1,9,0),
DT.datetime(2013,1,1,8,5),
DT.datetime(2013,2,5,14,0),
DT.datetime(2013,2,5,16,0),
DT.datetime(2013,2,8,20,0),
DT.datetime(2013,2,8,16,50),
DT.datetime(2013,2,8,7,0),
DT.datetime(2013,7,4,8,0)]})
df = df.set_index(['Start', 'Trader', 'Product'])
I would like to resample this DataFrame on a daily basis and compute the following complex statistics.
For each product: What is the weekly average difference between between the EMA Value (the day before) and Traders accutal purchases
For each product: What is the weekly average variance of Traders purchasing dates
As a first step therefore I am unstaking the Product and Trader columns and calculate the ema using:
df1 = df.unstack(level=[1,2]).fillna(0).resample('D', 'mean').fillna(0)
df2 = pd.ewma(df1, span=7)
Update:
Thanks to the comments by Wouter Overmeire I could combine the shifted trend values with the original values using:
df2 = df2.shift(freq='D')
df3 = pd.concat([df1, df2], keys=['acctual', 'trend'], axis=1)
df4 = df3.stack(['Trader', 'Product']).fillna(0)
Based on this DateFrame I try to answer the two questions outlined above using a group by function. Unfortunately, I cannot abstract from the trader column as it is still in the result of the group by function. Does anybody has an idea?
df4 = df4.reset_index()
def f (df):
df['Diff'] = abs(df[('acctual', 'Quantity')] - df[('trend', 'Quantity')])
return df
df5 = df4.groupby([df4['Start'].map(lambda x: x.year), df4['Start'].map(lambda x: x.week), df4['Product']])
df5.apply(f)
I would deeply appreciate any help.
Thanks
Andy