2

I have the following dataframe

 id        type    side       score       
 601166    p       right      2  
 601166    p       left       6        
 601166    p       right      2  
 601166    p       left       4      
 601166    r       left       2  
 601166    r       left       2  
 601166    r       right      6  
 601166                       2  
 601009    r       left       6  
 601009    r       right      8  
 601939    p       left       2  
 601939    p       left       2  

I have calculated the average score for each id, type and side with:

df_result=df.groupby(["id", "type","side"])["score"].mean()

 id        type    side       mean       
 601166    p       right      2  
 601166    p       left       5        
 601166    r       right      6  
 601166    r       left       2   
 601166                       2       

But now I would like to calculate the average score for each id and type and add weights to the average scores on each side: the lowest average score for the left or right side counts for 75%, the highest score for 25%.

An example result for id 601166, first calculate the average for each side. The side with the lowest score (right) counts for 75%, the other side (left) for 25%. Empty values can be skipped.

 id        type         mean       
 601166    p            2,75  
 601166    r            3  

Any idea how I can add this to my code?

olive
  • 179
  • 1
  • 11
  • Does your weight need to be grouped by type? To be clear, you just want to say the higher number (between left and right) gets a weight of 25 and the other 75? Should this be another column, or do you actually want it to be concatenated to the mean? – David Maddox Oct 25 '21 at 20:32
  • The weight can be added as an extra column to make it easier to check the logic but in the end I just need one value for the mean which is based on these weights – olive Oct 26 '21 at 08:49

1 Answers1

3

Would something like this suffice?

df_result = df.groupby(["id", "type", "side"])["score"].mean()
g = df_result.groupby(["id", "type"])
g.min() * 0.75 + g.max() * 0.25
id      type
601009  r       6.50
601166  p       2.75
        r       3.00
601939  p       2.00
Name: score, dtype: float64
hyit
  • 496
  • 4
  • 10