1

I'm looking for a solution to implement sponsored images on one of my GAE apps.

We've got about 5000 users using the app and these sponsored images needs to be tracked every time it is viewed and every time somebody clicks on them.

Somebody suggested having multiple entries for counters, then randomly incrementing these counters in order to get pass the datastore write limit, but if you happen to have two views at exactly the same time and both try to write to the datastore at the same time, the second write will overwrite the first write meaning you lose one view.

At the moment we're creating a new datastore entry for every view and every click and have a scheduler passing it to a queue that adds up all the views and clicks saving the count in a stats entity - not very efficient.

Jan Vladimir Mostert
  • 12,380
  • 15
  • 80
  • 137
  • Why not use the distributed counter you were suggested? You can distribute it to 20-rows counter which means you'll have to reach 20 hits a second (a lot) in order to make it highly-untrustable – MeLight Mar 11 '14 at 13:55
  • Let's say we're well below 20 hits/second for the sake of the argument. If only one instance is running, it should work, but as soon as two instances are running, you have a 1/20 chance that two instances will be trying to write to the same entry (thus missing that track due to both incrementing the same entry at the same time.) and more so with more instances. Is there some sort of "row-level-locking" that I can use to get pass having to sacrifice accuracy? distrubuted-counter + entity-locking would be the perfect solution. – Jan Vladimir Mostert Mar 11 '14 at 14:19
  • I'm not a probability expert, but I believe that two instances trying to hit the same counter row out of 20, would sum in a chance of 1/400. As to your question about locking, I'm not aware of such mechanism (would be really nice though). – MeLight Mar 11 '14 at 15:19
  • 1
    Just thought of it - you can use a queue, with a throughput (don't remember the exact term) of one task a time. That way you know you'll have a one operation done each time – MeLight Mar 11 '14 at 15:22
  • That's a pretty cool idea, let each view be thrown on a queue, limit the number of tasks being processed per second to a rate that's less than the write rate and link each queue to one of the distributed counters, that way you'll get exact counts. I'm happy to accept that as an answer. – Jan Vladimir Mostert Mar 11 '14 at 20:25

1 Answers1

0

Posting this as an answer :)

You can use a queue with a throughput rate of one task a time, and send the count operations to that queue. That way you will know that only one count operation is preformed each time on counter.

MeLight
  • 5,454
  • 4
  • 43
  • 67