1

I have a large dataframe representing scores of products belonging to various product groups.

I need to:

  1. Group all rows by beer_style

  2. For each beer_style calculate the mean of that style

  3. For each beer_style: subtract the mean for this specific style from each of this product group's elements (beers). The result (Value - Mean) will replace the original Value (no need for additional column)

here is what I tried

    def normalize(group):  # Normalize each group - elements same beer_style
       group.review_overall -= group.review_overall.mean()
       group.review_aroma -= group.review_aroma.mean()
       group.review_appearance -= group.review_appearance.mean()
       group.review_palate -= group.review_palate.mean()
       group.review_taste -= group.review_taste.mean()
       return group
   df = df.groupby('beer_style').apply(normalize) 
   df.describe()

I got the table but the numbers look suspicious (all MEAN values for 5 parameters (see above) are very close to zero)

I am not sure that I properly coded my goal.

Please, help

ekad
  • 14,436
  • 26
  • 44
  • 46
Toly
  • 2,981
  • 8
  • 25
  • 35
  • Are you sure you didn't clobber the original value of `df`? You reassign the results of your normalization back to `df` in your penultimate line. Generally, you would want to use a different variable in case you want to do some other calculations on the original data. – Alexander Sep 08 '15 at 04:55
  • @Alexander - great point! this is only an example. I always make tempDF from the original file and work on it. As a side note, I would hope there is a pre-built normalization function in numpy or in a similar library, ehich could be used. I will be very grateful for the reference. – Toly Sep 08 '15 at 05:18

0 Answers0