Having read How Not To Sort By Average Rating I thought I should give it a try.
CREATE FUNCTION `mydb`.`LowerBoundWilson95` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
IF(
pos + neg <= 0,
0,
(
(pos + 1.9208) / (pos + neg)
-
1.96 * SQRT(
(pos * neg) / (pos + neg) + 0.9604
)
/ (pos + neg)
)
/
(
1 + 3.8416
/ (pos + neg)
)
);
Running some tests, I discover that objects with pos=0
and neg>0
have very small, but non-negative scores, whereas an object with pos=neg=0
has a score of zero, ranking lower.
I am of the opinion that an unrated object should be listed above one which has no positive ratings but some negatives.
I reasoned that "the individual ratings are all really expressions of deviation from some baseline, so I'll move the baseline, I'll give every object a 'neutral' initial score," so I came up with this:
CREATE FUNCTION `mydb`.`AdjustedRating` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
(
SELECT `mydb`.`LowerBoundWilson95` (pos+4, neg+4)
);
Here are some sample outputs for AdjustedRating
\ pos 0 1 2
neg
0 | 0.215 | 0.188 | 0.168
1 | 0.266 | 0.235 | 0.212
2 | 0.312 | 0.280 | 0.235
This is closer to the sort of scores I want and as a numerical hack I guess it's workable, but I can't mathematically justify it
Is there a better way, a "right" way?