2

I'm trying to create a pivot with pandas where the aggregation function is truncated mean that takes 10% of each end of data.

    pivot = pd.pivot_table(data[(data['New_category_ID'] > 0)&(data['YYYY'] == 2016)], index = 'New_category',  values=['GrossRevenue_GBP','MOVC_GBP','PM_GBP'],aggfunc=stats.trim_mean(0.1))

I'm getting this following error:

TypeError: trim_mean() missing 1 required positional argument: 'proportiontocut'

stats.trim_mean() takes two arguments one being the data the other being the % of data to remove from each end of the data. In this case I'm specifying 10%.

How do I get around the need to specify the column/dataframe it should take the trimmed mean of given I am trying to create pivot table?

Would really appreciate some help

D_usv
  • 433
  • 7
  • 21
  • `aggfunc=lambda x: stats.trim_mean(x, 0.1)` – ayhan Aug 09 '17 at 11:19
  • @ayhan - thank you so much. It runs perfectly, however when I validate with Excel it seems that the trim mean from Excel is different from that given from Python. Need to decide now which one is correct... If you make it an answer I will approve it – D_usv Aug 09 '17 at 11:53
  • That's probably a difference caused by how they calculate the percentiles (there are several approaches). – ayhan Aug 09 '17 at 12:19
  • Most probably..On further validation, when I say do truncated mean of 0.5 (i.e. the median) in Python and compare this to the median calculated in Excel they are the same. However when I do truncated mean in Excel of 0.5 it gives a different result to the median calculated in Excel. I will go with the Python truncated mean results. Thanks again – D_usv Aug 09 '17 at 12:49

1 Answers1

4

You can use a lambda function:

pivot = pd.pivot_table(data[(data['New_category_ID'] > 0) & (data['YYYY']==2016)], 
                       index='New_category',
                       values=['GrossRevenue_GBP', 'MOVC_GBP', 'PM_GBP'],
                       aggfunc=lambda x: stats.trim_mean(x, 0.1))
ayhan
  • 70,170
  • 20
  • 182
  • 203