2

I have two tables.

Table A: Contains a list of songs, song artwork, mp3 link, tags etc.

Table B: Contains registered user info, user id, username etc.

I am about to add a star rating system to the songs and would like to let any registered user vote only once per song.

So my plan initially was to create a third table and use a JOIN:

Table C: Containing songID, Total score(the sum of all votes cast), vote_count (number of votes) and perform a calculation clientside in jQuery to return the average vote.

I figured this would be optimal for performance since I will be dealing with extremely large datasets.

Of course using this method I would have no protection against users voting as many times as they want.

Therefore, my question is, what database set up would be best to protect against cheating (i.e. storing and checking against userID's of voters in table C) without degrading the performance of filtering/sorting the songs themselves as this is key to the project.

I hope I made this request clear, my apologies if not.

gordyr
  • 6,078
  • 14
  • 65
  • 123
  • Try something along the lines of the following http://stackoverflow.com/questions/4519779/php-star-rating-system-concept/4519969#4519969 The image_vote table uses a clustered composite primary key (image_id, user_id) so look ups will be very fast – Jon Black Nov 28 '11 at 14:30
  • Thats a huge help f00... Thanks a lot. :-) – gordyr Nov 28 '11 at 15:46

4 Answers4

3

Make a vote table: ([userID, songID], rating)

Maybe put an index on songID for faster access.

Tom van der Woerdt
  • 29,532
  • 7
  • 72
  • 105
  • Thanks Tom, This is fine and was my first thought, however. Lets assume a 500,000 song dataset with each song having between 400-1000 votes. Would query performance in this scenario be optimal with a simple storage method like this, or is there anything else I should be doing? – gordyr Nov 28 '11 at 13:53
  • If that happens you should probably add a Memcache layer to your application, so that these queries only have to be executed once per x minutes and the Memcache server can simply increment the value in the memory. On the scale you're talking about you don't want to use MySQL for this kind of things - use a cache server instead, with MySQL as a fallback. – Tom van der Woerdt Nov 28 '11 at 13:55
  • I would personally use both a table like that, to track which user voted on what, and two extra columns with the total number of stars and votes to cache that data for quicker access (than you could simply `SELECT stars_count / vote_count AS avg_starts`, and `UPDATE .. SET vote_count=vote_count+1, stars_count = stars_count+ [user vote]` in addition to inserting to the vote table [probably as a trigger, not in the app itself]). – shesek Nov 28 '11 at 14:01
  • That's a cache, which doesn't belong in a database. Simple cache servers such as memcached are a LOT faster. – Tom van der Woerdt Nov 28 '11 at 14:02
  • Thanks Tom, Memcached is already in place... I suppose further optimization would mean moving to InnoDB or something similar for this kind of scale. My Understanding of database caching mechanisms is minimal, so I guess this is as good a time as any to learn. Thanks to all who answered, I guess the answer really was as simple as I thought. The answer goes to Tom though for answering my comments also. :) Apologies to the rest of you who answered the essentially the same. – gordyr Nov 28 '11 at 14:06
  • InnoDB is a lot slower than MyISAM - I'd recommend against it. – Tom van der Woerdt Nov 28 '11 at 14:08
  • How annoyingly confusing, i'm certain I read that for this kind of application InnoDB would be preferable? Would you recommend sticking with what I have then and simply learning good use of the memcache layer we have set up? – gordyr Nov 28 '11 at 14:12
  • InnoDB is good for relational performance, while MyISAM will purely focus on speeds and not care about other tables. When you insert something in an InnoDB table, it first checks all other tables to verify the foreign keys. MyISAM won't do this. And with a table like this, if the results are cached, it's mainly about writing and not reading. For the Memcached layer I recommend using two keys (SONG__VOTES, SONG__SCORE) which you can easily combine with Memcached's incr() and decr() features. If the cache goes missing (or expires) simply fetch it from the DB again. – Tom van der Woerdt Nov 28 '11 at 14:19
  • Fantastic... More help than I could have imagined. Much appreciated Tom. :-) – gordyr Nov 28 '11 at 14:23
1

Your third table should be structured like:

Song ID
User ID
Star Rating

- with a unique index on (Song ID, User ID). (You have to store Song ID and User ID together to be able to tell who has voted - there is no way around this.)

To return an average rating for a given Song, simply

select AVG(`Star Rating`) From `Rating Table` where `Song ID` = ?

On an indexed table, selecting an average for a specific Song with under 1000 ratings should give reasonable access times.

  • Thanks Mark, and especially for answering above and beyond with the query. Despite you giving the same answer I have awarded it to Tom as he has answered my continuing question regarding performance and scale. Hope thats okay. – gordyr Nov 28 '11 at 14:09
1

Try this:

  • Album | artist id, artwork
  • Artist | overall rating
  • Song | artist id, album, rating
  • Users

That way you can pull up by artist, by song, by album etc.

Meisam Mulla
  • 1,845
  • 3
  • 23
  • 37
1

depends on how up to date you want the rating to be, for star rating, it doesn't necessarily need to right up to the minute. So you can have the

vote table: ([userID, songID], rating)

as Tom van der Woerdt suggested, but you can also add a star rating to each song and recalculate it daily or every few hours if you have the capacity.

Jaydee
  • 4,138
  • 1
  • 19
  • 20