I have an MS SQL Server 2008 database where I store places that serve food (cafés, restaurants, diners etc.). On a web site connected to this database people can rate the places on a scale from 1 to 3.
On the web site there's a page where people can view a top list with the top 25 (best rated) places in a certain city. The database structure looks something like this (there is more info stored in the tables, but here's the relevant info):
A place is situated in a city and votes are placed on a place.
Up until now I've just calculated an average vote score for each place where I divide the sum of all votes for a certain place with the number of votes for that place, something like this (pseudo code):
vote_count = total number of votes for the place
vote_sum = total sum of all the votes for the place
vote_score = vote_sum/vote_count
I also have to handle divide by zero if a place has no votes. All this is done inside the stored procedure that fetches the other data that I want to display in the top list. Here is the current stored procedure that fetches the top 25 places with the highest vote score:
ALTER PROCEDURE [dbo].[GetTopListByCity]
(
@city_id Int
)
AS
SELECT TOP 25 dbo.Places.place_id,
dbo.Places.city_id,
dbo.Places.place_name,
dbo.Places.place_alias,
dbo.Places.place_street_address,
dbo.Places.place_street_number,
dbo.Places.place_zip_code,
dbo.Cities.city_name,
dbo.Cities.city_alias,
dbo.Places.place_phone,
dbo.Places.place_lat,
dbo.Places.place_lng,
ISNULL(SUM(dbo.Votes.vote_score),0) AS vote_sum,
(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id) AS vote_count,
COALESCE((CONVERT(FLOAT,SUM(dbo.Votes.vote_score))/(CONVERT(FLOAT,(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id)))),0) AS vote_score
FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id
LEFT OUTER JOIN dbo.Votes ON dbo.Places.place_id = dbo.Votes.place_id
WHERE dbo.Places.city_id = @city_id
AND dbo.Places.hidden = 0
GROUP BY dbo.Places.place_id,
dbo.Places.city_id,
dbo.Places.place_name,
dbo.Places.place_alias,
dbo.Places.place_street_address,
dbo.Places.place_street_number,
dbo.Places.place_zip_code,
dbo.Cities.city_name,
dbo.Cities.city_alias,
dbo.Places.place_phone,
dbo.Places.place_lat,
dbo.Places.place_lng
ORDER BY vote_score DESC, vote_count DESC, place_name ASC
RETURN
As you can see it fetches more than just the vote score - I need data about the place, the city it's situated in and so on. This works fine, but there is one big problem: the vote score is too simple because it doesn't take in to account the number of votes. With the simple calculation method a place that has one vote with the score 3 will end up higher in the list than a place that has fourteen votes with the score 3 and one vote with the score 2:
3/1 = 3
(14*3 + 1*2) = 44/15 = 2.933333333333
To fix this I've been looking into using some form of weighted average/weighted index. I've found an example of a true bayesian estimate that looks promising. It looks like this:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the place (mean) = (Rating)
v = number of votes for the place = (votes)
m = minimum number of votes required to be listed in the Top 25 (unsure how many, but somewhere between 2-5 seems realistic)
C = the mean vote across the whole database
The problems begin when I try to implement this weighted rating in a stored procedure - it quickly becomes complicated and I get tangled into parenthesis and loose track of what the stored procedure does.
Now I need some help with two questions:
Is this a suitable method for calculating a weighted index for my site?
How would this (or another suitable calculation method) look like when implemented in a stored procedure?