2

I'm trying to create a Like/Unlike system akin to Facebook's for an existing comments section of a website, and I need help in designing the system.

Currently, every product on the website has a comments section and members can post and like comments. I need to know each member has posted how many comments and each of his comments has received how many likes. Of course, I need to know who liked what comments too (partly so that I can prevent a user from liking a comment more than once) for analytical purposes.

The naive way of implementing a Like system to the current comments module is to create a new table in the database that has foreign keys to the CommentID and UserID. Then for every "like" given to a comment by a user, I would insert a row to this new table with the targeting comment ID and user ID.

While this might work, the massive amount of comments and users is going to cause this table to grow quickly and retrieving records from and doing counts on this huge table will become slow and inefficient. I can index either one of the columns, but I don't know how effective it would be. The website has over a million comments.

I'm using PHP and MySQL. For a system like this with a huge database, how should I designing a Like system so that it is more optimised and stable?

Carven
  • 14,988
  • 29
  • 118
  • 161
  • You may want to design your LIKE system to not be real time. Design the tables the "correct" way, but don't read them real time to get the immediate LIKE counts. Update the counts every few minutes, hours, whatever. – AgRizzo Jun 10 '16 at 14:37
  • @AgRizzo How would the "correct" way of design be like? – Carven Jun 10 '16 at 14:42
  • Your "naive" way - i.e., normalized. You need to know which user liked which comment, so two columns (at a minimum): user_id and comment_id. – AgRizzo Jun 10 '16 at 14:44
  • I think this is too broad and opinion based to be a good fit for SO - answers posted here are already generating more discussion rather than single Q&A pairs. This could be accomplished multiple ways, as OP points out and exploring all the options would be difficult. – HPierce Jun 10 '16 at 14:46
  • 1
    @HPierce I don't quite understand your downvote. There are certainly multiple ways to accomplish this. Most things can be accomplished in different ways. But your own best way to accomplished this can be one of the answers to this question. And answers with different ways to accomplish this can serve as good references for future readers too. – Carven Jun 13 '16 at 13:57

2 Answers2

2

For scalability, do not include the count column in the same table with other things. This is a rare case where "vertical partitioning" is beneficial. Why? The LIKEs/UNLIKEs will come fast and furious. If the code to do the increment/decrement hits a table used for other things (such as the text of the Comment), there will be an unacceptable amount of contention between the two.

This tip is the first of many steps toward being able to scale to Facebook levels. The other tips will come, not from a free forum, but from the team of smart engineers you will have to hire to get to that level. (Hints: Sharding, Buffering, Showing Estimates, etc.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
1

Your main concern will be a lot of counts, so the easy thing to do is to keep a separate count in your comments table.

Then you can create a TRIGGER that increments/decrements the count based on a like/unlike.

That way you only use the big table to figure out if a user already voted.

Evert
  • 93,428
  • 18
  • 118
  • 189
  • So I would have another 2 more tables, one for storing how many likes a comment has and another one for storing how many comments a user has posted? I thought of something like this before but was also considering that, given that the website has about over a million comments and a few hundred thousands users, that Likes table can possible grow to have over tens of millions of rows! So every time when I need to know how who liked a comment, I would still need to run through this massive table? – Carven Jun 10 '16 at 14:37
  • Another thing is, let's say I need to check if a user had already liked a comment before casting his like (to prevent him from liking a comment more than once), I would still need to run throw that giant table, wouldn't I? – Carven Jun 10 '16 at 14:41
  • No you don't need new tables, you can make them part of your comments table. And to prevent a user from commenting/liking twice... you don't need to "run through" the table. Assuming that you mean scanning the entire table with "run through". With a good index this should be pretty fast. Tens of millions is tiny for such a simple dataset. – Evert Jun 10 '16 at 14:51