12

Wilson's Confidence Interval takes as arguments the values TRUE or FALSE, or "upvotes" and "downvotes" respectively. From these votes it generates a rating.

For the purpose of my project, I think WCI is perfect. However, the scalar upvote and downvote is not enough to describe the thing I am rating.

That's where 5 star rating comes in, and this is where I need someone to disprove my logic. Now I'm thinking, if I were to implement a 5 star rating with WCI then the following should work without hacking the internals of the confidence interval.

For each star in the rating widget we assign a unique integer value. Each value either counts as a positive (upvote) or negative (downvote). So the following values would be:

1/5 stars: -2 2/5 stars: -1 3/5 stars: 1 4/5 stars: 2 5/5 stars: 3

To summarise the above values. The minimum vote of 1 star is classed as 2 downvotes. A vote of 2 stars is classed as 1 down vote. For the medium vote of 3 stars we give 1 upvote. For 4 stars we give 2 upvotes. And for the maximum of 5 stars we give 3 upvotes.

Please, disprove this logic, why won't this work? Maybe it goes against the "average person's understanding" of a star rating system?

Michael Rich
  • 191
  • 1
  • 11
  • 1
    There are other things you might want to compare this with. For instance, you could run 4 different confidence intervals in parallel - an interval for "at least two stars", for "at least three stars"... and so on or you could work out a confidence interval for the mean number of stars, or for the median number of stars. There are a lot of ways to reduce a distribution on 5 possibilities down to a single number and which one you want probably depends on exactly what you want to do with that single number. – mcdowella Oct 27 '13 at 06:12
  • https://www.evanmiller.org/ranking-items-with-star-ratings.html might help – Suzana Jan 21 '22 at 10:17

3 Answers3

5

It's easy to think of the following 'workaround' which converts a multi-ranking system to the binary 'upvote/downvote'-style ranking (that can then be scored using the lower bound of Wilson score confidence interval):

Let's say you have the popular 5 star rating system. So we have a number of votes, each having a value of: 1, 2, 3, 4 or 5.

To 'convert' these ratings to up/down votes, use the following rule:

For star rating -- Add

*     - 0.00 to up votes and 1.00 to down votes (i.e. a full down vote)
**    - 0.25 to up votes and 0.75 to down votes
***   - 0.50 to up votes and 0.50 to down votes
****  - 0.75 to up votes and 0.25 to down votes
***** - 1.00 to up votes and 0.00 to down votes (i.e. a full up vote)

After we reduce the 5 star ratings to up/down ratings, we can proceed with the usual score calculations described in Evan Miller's article.

As I am not a statistician or mathematician and I would love to hear from other people if this makes sense or not and what might be the issues with this approach.

Nikolay Suvandzhiev
  • 8,465
  • 6
  • 41
  • 47
  • 2
    I take a similar approach, multiplying the votes by 5. As if each user would have 5 votes available, so 1 star would mean 1 upvote and 5 downvotes. That also avoids me some float operations. – Mr. Goferito Nov 03 '18 at 14:42
2

First, try to understand what is the intuition behind WCI. Or, even simpler, Normal approximation interval ( http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval ).

The intuition behind all this interval calculation is simple. You calculate a sample mean and the standard deviation. Interval is mean+-z*std.

In your case calculating mean is simple. It is the mean of ratings itself. Assume p1 is the fraction of 1-star rating, p2,..., p5. p1+p2+...+p5 = 1. And assume you are calculating these stats using n samples. mean of your data is 1*p1+2*p2+...+5*p5.

The variance of your data is ( E(x^2)-(E(x))^2 )/n = ( (p1*1^2 + p2*2^2..+p5*5^2) - (1*p1+2*p2+..+5*p5)^2 )/n

Since std = sqrt(var), it is pretty straightforward to calculate Normal approximation interval. I will let you work on extending this to WCI.

ElKamina
  • 7,747
  • 28
  • 43
  • 2
    Doh! I wish that I had taken stats in college ... Somehow I took all the calculus and other math classes but stats, and now this lack of knowledge is biting me in the butt. @ElKamina, I'd be very grateful if you can expand your answer and hopefully provide an implementation in Ruby (like on this page http://www.evanmiller.org/how-not-to-sort-by-average-rating.html) -- that will be amazing for anyone who is looking for an answer to using this 5-star tweak for the Wilsons Confidence Interval. – Alex Le Sep 16 '15 at 06:36
1

The biggest problem with this scheme is that a single 5-star rating will weigh as much as 3 2-star ratings. And also, an item with 300 3-star ratings (which should be a mediocre score) will have the same score as an item with 100 5-star ratings (which should be a perfect score).

What you could do is calculate a Wilson confidence interval for each possible score. The lower bound of each interval is then the weight of that score towards the (weighted) average.

Apocalisp
  • 34,834
  • 8
  • 106
  • 155