2

I'm trying to apply the Bayesian rating formula, but if I rate 1 out of 5 thousand of hundreds, the final rating is greater than 5.

For example, a given item has no votes and after voting 170,000 times with 1 star, its final rating is 5.23. If I rate 100, it has a normal value.

Here is what I have in PHP.

<?php
// these values came from DB
$total_votes     = 2936;    // total of votes for all items
$total_rating    = 582.955; // sum of all ratings
$total_items     = 202;

// now the specific item, it has no votes yet
$this_num_votes  = 0;
$this_score      = 0;
$this_rating     = 0;

// simulating a lot of votes with 1 star
for ($i=0; $i < 170000; $i++) { 
    $rating_sent = 1; // the new rating, always 1

    $total_votes++; // adding 1 to total
    $total_rating = $total_rating+$rating_sent; // adding 1 to total

    $avg_num_votes = ($total_votes/$total_items); // Average number of votes in all items
    $avg_rating = ($total_rating/$total_items);   // Average rating for all items
    $this_num_votes = $this_num_votes+1;          // Number of votes for this item
    $this_score = $this_score+$rating_sent;       // Sum of all votes for this item
    $this_rating = $this_score/$this_num_votes;   // Rating for this item

    $bayesian_rating = ( ($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating) ) / ($avg_num_votes + $this_num_votes);
}
echo $bayesian_rating;
?>

Even if I flood with 1 or 2:

$rating_sent = rand(1,2)

The final rating after 100,000 votes is over 5.

I just did a new test using

$rating_sent = rand(1,5)

And after 100,000 I got a value completely out of range range (10.53). I know that in a normal situation no item will get 170,000 votes while all the other items get no vote. But I wonder if there is something wrong with my code or if this is an expected behavior of Bayesian formula considering the massive votes.

Edit

Just to make it clear, here is a better explanation for some variables.

$avg_num_votes   // SUM(votes given to all items)/COUNT(all items)
$avg_rating      // SUM(rating of all items)/COUNT(all items)
$this_num_votes  // COUNT(votes given for this item)
$this_score      // SUM(rating for this item)
$bayesian_rating // is the formula itself

The formula is: ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes). Taken from here

rlcabral
  • 1,496
  • 15
  • 39
  • What are the values of the variables you use to calculate `$bayesian_rating`? `$avg_num_votes` and others. – Ishtar May 15 '11 at 21:46
  • I edited the question to add a better explanation for some variables. I start to think that when a item gets too many votes while other items don't get new votes, the rating of this item tends to infinite. – rlcabral May 15 '11 at 22:12
  • But what are the actual values? Can you print them? – Ishtar May 15 '11 at 22:24
  • rrenaud already found the problem. – rlcabral May 15 '11 at 22:29

1 Answers1

3

You need to divide by total_votes rather than total_items when calculating avg_rating.

I made the changes and got something that behaves much better here.

http://codepad.org/gSdrUhZ2

Rob Neuhaus
  • 9,190
  • 3
  • 28
  • 37