0

I'm working on an Google App Engine (python) based site that allows for user generated content, and voting (like/dislike) on that content.

Our designer has, rather nebulously, spec'd that the front page should be a balance between recent content and popular content, probably with the assumption that these are just creating a score value that weights likes/dislikes vs time-since-creation. Ultimately, the goals are (1) bad content gets filtered out somewhat quickly, (2) content that continues to be popular stays up longer, and (3) new content has a chance at staying long enough to get enough votes to determine if its good or bad.

I can easily compute a score based on likes/dislikes. But incorporating the time factor to produce a single score that can be indexed doesn't seem feasible. I would essentially need to reindex all the content every day to adjust its score, which seems cost prohibitive once we have any sizable amount of content. So, I'm at a loss for potential solutions.

I've also suggested something where where we time box it (all time, daily, weekly), but he says users are unlikely to look at the tabs other than the default view. Also, if I filtered based on the last week, I'd need to sort on time, and then the secondary popularity sort would essentially be meaningless since submissions times would be virtually unique.

Any suggestions on solutions that I might be overlooking?

Would something like Google's Prediction API or BigQuery be able to handle this better?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Clint Doriot
  • 141
  • 1
  • 12

1 Answers1

2

Such a system is often called "frecency", and there's a number of ways to do it. One way is to have votes 'decay' over time; I've implemented this in the past on App Engine by storing a current score and a last-updated; any vote applies an exponential decay to the score based on the last-updated time, before storing both, and a background process runs a few times a day to update the score and decay time of any posts that haven't received votes in a while. Thus, a post's score always tends towards 0 unless it consistently receives upvotes.

Another, even simpler system, is to serial-number posts. Whenever someone upvotes a post, increment its number. Thus, the natural ordering is by creation order, but votes serve to 'reshuffle' things, putting more upvoted posts ahead of newer but less voted posts.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • So, to expand on the second option, basically, we have a datetime property that is used to sort the content, and that property is updated for new content, and any time someone up votes the content? Or were you suggesting something slightly more complex? – Clint Doriot Jul 17 '15 at 14:48
  • 1
    @ClintDoriot No, it needs to be a counter. The first article posted has score 1, the second has score 2, etc. When someone upvotes a post, increment its score, and sort by score to display. – Nick Johnson Jul 17 '15 at 14:52
  • Ah, I see. We could even weight the increment to push up voted content up the list faster (score +2) or slower (score +0.5). – Clint Doriot Jul 17 '15 at 14:56
  • What about this variation on approach #1: Sort based on a datetime property descending, but modify that date based on votes. Exponentially decrease the amount an upvote adds to that date, based on the original creation date. Would that then eliminate the need for a background task to periodically update older? – Clint Doriot Jul 17 '15 at 15:02
  • 1
    @ClintDoriot The problem with that is that it adds an explicit dependency on the frequency of posts - you'd need to constantly tweak it to balance the value an upvote or downvote has based on the current traffic level. – Nick Johnson Jul 17 '15 at 15:04