I have a DataFrame that look like this:
In [55]:df
Out [55]:
WEIGHT TR_YTD TR_3Y_Ann Categ1 Categ2 Number
0 0.131214 -0.28 8.49 1 a 0
1 0.052092 1.69 13.70 3 b 1
2 0.045993 2.50 NaN 1 a 2
3 0.041450 -4.57 5.07 1 c 3
4 0.040769 7.64 17.49 2 a 4
5 0.039791 0.07 0.21 1 a 5
6 0.039271 -6.14 8.88 3 a 6
7 0.038340 -8.13 NaN 1 c 7
8 0.038227 9.26 13.78 2 a 8
9 0.033878 0.02 11.45 1 a 9
10 0.029455 5.91 24.86 3 b 10
I have weights, performances and categories (in reality I have many more columns but this is a minimum working example). I would like to:
- group first by each categ alone and then by combinations of the first two categories
- aggregate several columns with different functions using a dictionary (as shown in Wes McKinney's book)
- calculate the weighted average performances using my user defined function
- take into into account NaNs in my data
- do some sorting in my output
The code:
def get_wavg(df, field):
return np.average(df[field], weights=df['WEIGHT'])
groups = [df['Categ1'], df['Categ2'], [df['Categ1'],df['Categ2']]]
funcdict = {'Number':'count',
'WEIGHT': 'sum', \
'TR_YTD': lambda x: get_wavg(x,'TR_YTD'), \
'TR_3Y_Ann': lambda x: get_wavg(x,'TR_3Y_Ann')}
for group in groups:
# preparing list to sort dynamically
# sorting by the first layers of groups (excluding last) and then weights
groupnames = [x.name for x in group]
sortinglist = groupnames[:-1]
sortinglist.append('WEIGHTS')
ascendinglist = [True]*(len(groupnames)-1)+[False]
# apply agg functions
grouped = df.groupby(group)
grouped = grouped.agg(funcdict)
# sorting
grouped = pd.DataFrame(grouped)
grouped.reset_index(inplace=True)
grouped.sort_values(by=sortinglist,ascending=ascendinglist,inplace=True)
grouped.set_index(groupnames,inplace=True)
print (grouped)
Problem is that I can't find the correct syntax to make it work with my user defined function. If I use np.nanmean
instead it works but doesn't give me the result I am looking for.
What is the correct way to do this?