0

I have a df of players and their penalty points. It looks something like this:

# playerID  penalty 
#   A         0
#   B         2
#   A         4
#   C         2
#   A         0
#   B         0
#   B         2 

I want to get a "player rank" which depends on the "penalty" and number of played games (A three games, C one game). If there is a player X with average penalty = 3 over 10 games and player Y with average penalty = 3 over 4 games then the "player rank" of the player X should be lower then the "player rank" of the player Y.

My idea was to divide the average penalty of player i by sqrt(n(i)), where n(i) is number of games played by player i. Second idea was to divide average penalty by log(n(i)+1).

Do you think its the right approach? Was somebody dealing with a similar problem?

Thanks for your responses.

K-tan
  • 79
  • 8
  • If you are asking for general strategies for ranking, that's not really a specific programming problem that's appropriate for Stack Overflow. The "right approach" depends on lot on what type of inference you want to make. This may be a better fit for [stats.se] where more theoretical statistical discussions are on-topic. – MrFlick Mar 31 '20 at 20:31

1 Answers1

0

I think you want to transfer this post to Cross Validated. Anyway, this is the solution to your problem via data.table.

library(data.table)
df = data.frame( playerID = c( "A", "B", "A", "C", "A", "B", "B"),  
                 penalty = c( 0, 2, 4, 2, 0, 0, 2 ) )

setDT(df)

df[ , games := .N, by = playerID ]
df[ , mean_penalty := mean( penalty ), by = playerID ]
df[ , `:=` ( rank1 = mean_penalty / sqrt(games),
             rank2 = mean_penalty /  log(games+1) ) ]
final = unique(df, by = "playerID" )

> final
   playerID penalty games mean_penalty     rank1     rank2
1:        A       0     3     1.333333 0.7698004 0.9617967
2:        B       2     3     1.333333 0.7698004 0.9617967
3:        C       2     1     2.000000 2.0000000 2.8853901
Francesco Grossetti
  • 1,555
  • 9
  • 17