14

I'm creating a site whereby people can rate an object of their choice by allotting a star rating (say 5 star rating). Objects are arranged in a series of tags and categories eg. electronics>graphics cards>pci express>... or maintenance>contractor>plumber.

If another user searches for a specific category or tag, the hits must return the highest "rated" object in that category. However, the system would be flawed if 1 person only votes 5 stars for an object whilst 1000 users vote an average of 4.5 stars for another object. Obviously, logic dictates that credibility would be given to the 1000 user rated object as opposed to the object that is evaluated by 1 user even though it has a "lower" score.

Conversely, it's reliable to trust an object with 500 user rating with score of 4.8 than it is to trust an object with 1000 user ratings of 4.5 for example.

What algorithm can achieve this weighting?

Jamie Ramsamy
  • 141
  • 1
  • 3
  • Without having a good answer for you, I would say that an object which was rated by 1000 users has attracted more attention to itself than an object with only 500 ratings, regardless what the ratings are. – Doc Brown Feb 23 '11 at 21:06
  • Another observation: a 4-star rating from someone who rates everything as 3, 4 or 5 is worth less than a 4-star rating from someone who uses the whole range. – Peter Taylor Feb 23 '11 at 22:20
  • Related: http://fulmicoton.com/posts/bayesian_rating/ – Palec Dec 23 '14 at 16:31

4 Answers4

10

A great answer to this question is here: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

drewrobb
  • 1,574
  • 10
  • 24
  • 3
    +1 - Nice. Still, the formula on the page "consider[s] only positive and negative ratings (i.e. not a 5-star scale)". Any idea how to expand it to a 5-star rating scale? – Justin Feb 23 '11 at 21:24
  • Map 5stars to 1, 1star to 0, interpolate the rest. Change the observed fraction of positive rating to the average rating. However, this throws away information about rating distribution-- it only uses the average and total numbers, I'm not sure how to take this into account, but it might not be very important. – drewrobb Feb 23 '11 at 22:32
  • I don't think the Wilson interval works that way. It's designed for binomial variables (i.e.: only two outcomes). When you're comparing small sizes, these details do indeed matter. – mhum Feb 24 '11 at 04:19
  • @mhum: Doesn't it depend on what you assume a 5 star rating means? I could assume that a person is making 4 independent binary ratings, and the 5 star rating is just their sum. (Although you would have multiply n by 4 here to be correct). – drewrobb Feb 24 '11 at 17:12
  • That's a very interesting idea that certainly could work. I was thinking of the 5-star rating as a discrete probability distribution on 5 values. – mhum Feb 25 '11 at 03:37
3

You can use the Bayesian average when sorting by recommendation.

antonakos
  • 8,261
  • 2
  • 31
  • 34
  • Could you add more information? Seems pretty vague on Wikipedia: "Note that the additional information incorporated into the mean calculation [can be] a value subjectively determined by the person calculating the average as relevant and serving the purpose of the calculation." – Justin Feb 23 '11 at 21:26
  • 1
    I agree that it's written in an overly general way. Probably the simplest explanation is that you invisibly start every object with a bunch of rating values somewhere around the average. You need not calculate the actual average, you can just pick it arbitrarily and it still works. – jprete Feb 23 '11 at 22:13
2

I'd be tempted to have a cutoff (say, fifty votes though this is obviously traffic dependent) before which you consider the item as unranked. That would significantly reduce the motivation for spam/idiot rankings (especially if each vote is tied to a user account), and also gets you a simple, quick to implement, and reasonably reliable system.

Dan
  • 572
  • 1
  • 5
  • 11
1
simboid_function(value) = 1/(1+e^(-value));

rating = simboid_function(number_of_voters) + simboid_function(average_rating);
Vikas Kumar
  • 138
  • 1
  • 1
  • 7