0

I have a database consisting of clubs and its ratings people have provided them with.

Currently, I am performing an average of the ratings based on a club and then sorting these averages in descending order to have a list of highest rated clubs.

The problem I am having is there should be some weighting based on how many ratings you have. A club might get 5 (5.0) ratings and end up at the top of the list against a club that has 16K ratings and is also averaged with a 5.0 rating.

What I'm looking for is the algorithm which factors in the number of ratings to ensure we are querying the data with a weighted algorithm that takes in the number of ratings.

Currently my algorithm is:

(sum of club ratings)/(total number of ratings) to give me the average

This does not incorporate the weight algorithm

Haris
  • 12,120
  • 6
  • 43
  • 70
somejkuser
  • 8,856
  • 20
  • 64
  • 130
  • For that you need the range of the actual rating. Like suppose, your actual rating can be from `0k` to `100k`. Then that has to be mapped to `0k` to `5k`. – Haris Nov 06 '15 at 16:27
  • @Haris Can you further explain and possibly provide the algorithm for weighted rating values – somejkuser Nov 06 '15 at 16:29

2 Answers2

0

Lets suppose your ratings can go from 0k to 100k(as you said some club has 16k rating). Now you want that to be normalized to a range of 0k to 5k.


Lets say 0k to 100k is actual range. (A_lower to A_higher)

And, 0k to 5k is the normalized range. (N_lower to N_higher)

You want to change 16k, which is A_rating(Actual rating) to a normalized value which is N_rating(inbetween 0 to 5k).


The formula that you can use for this is

N-rating = A_rating * ( (N_higher - N_lower) / (A_higher - A_ lower) )

Lets take an example.

If the actual rating is 25k. The range of the actual rating is from 0 to 100k. And you want it normalized between 0 to 5k. Then

N-rating = 25 * ( (5 - 0) / (100 - 0) )
=> N_rating = 1.25

EDIT

A little more explanation

We do normalization, if there are values that are spread in a big range, and we want to represent them in a smaller range.

Q) What is a normalized value.

It is the value that would represent the exact place of the actual value(25k), if the Actual range(0 to 100) was a little smaller(0 to 5).

Q) why am i taking the division of a normalized range to a actual range and then multiplying by the actual rating.

To understand this, lets use a little of unitary method logic.

You have a value 25 when the range is 0 to 100, and would want to know what the value be normalized to if the range was 0 to 5. So,

//We will take already known values, the highest ones in both the ranges
100 is similar to 5 //the higher value of both the ranges
//In unitary method this would go like
If 100 is 5

//then 

1 is (5 / 100)

//and

x is x * (5 / 100) //we put 25 in place of x here

Q) why did you choose 0 to 5k as the normalized range.

I chose because you mentioned your rating should be below 5k. You can choose any range you wish.

Haris
  • 12,120
  • 6
  • 43
  • 70
  • I'd like to understand how this weight works. What is a normalized value and why am i taking the division of a normalized range to a actual range and then multiplying by the actual rating?Also, why did you choose 0 to 5k as the normalized range. I really just want to understand the algorithm before using it. – somejkuser Nov 06 '15 at 16:52
  • @jkushner, i have edited the answer and tried to explain, take a look. :) – Haris Nov 06 '15 at 17:11
0

What about simply adding the number of rating weighted wit a very little value? This is just a very basic idea:

(sum of club ratings)/(total number of ratings)+0.00000001*(number of club ratings)

This way clubs with same average get ranked by number of ratings.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
  • I apologize I don't understand this algorithm at all. Can you explain how this algorithm works in full detail explaining each step. thanks – somejkuser Nov 06 '15 at 16:58
  • It is simply a little modification of your average calculation. It adds a little weight on the number of ratings. This way when clubs have same average rating the one with more ratings is lightly rated higher. – MrSmith42 Nov 11 '15 at 08:33
  • I like your idea. My question is why did you choose such a small multiplier (0.00000001) – somejkuser Nov 11 '15 at 15:57
  • To avoid that no matter how high the difference in the number of ratings is, only the number of ratings matters, if the average rating is exactly equal. – MrSmith42 Nov 11 '15 at 16:25
  • Sounds good. Ill test your theory out and let you know if it works. I appreciate the assistance. – somejkuser Nov 11 '15 at 16:30
  • shouldn't the last variable be the number of club ratings? – somejkuser Nov 11 '15 at 16:53
  • someone who scores an average of 4.7 with 500 votes gets a lower score than someone who scores an average of 5.0 with just 1 vote because the multiplier never allows the 4.7 to achieve a higher value if added to this small die number. – somejkuser Nov 11 '15 at 17:47
  • @jkushner that was what I intended only clubs with same average rating are ordered by the number of ratings. – MrSmith42 Nov 13 '15 at 07:21