1

What is the viability of using a Long value generated by the GAE Sharded Counter code. In terms of having a unique Long id across datacenters?

Why do I need to use the counter value as ID? GAE generates very long Long values as entity id, which in my app I need to have short ID's like the one generated by the Sharded counter at first.

Question: Would the sharded counter at some point will generate the same value for a different request such that ID's might collide?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
xkm
  • 271
  • 2
  • 4
  • 11

1 Answers1

1

It is not viable since the sharded counter goal is to keep an eventually consistent count that helps avoid contention while increasing or decreasing the value of the counter by dividing that work in different shards. The get_count method will sum up all the sharded counters to return the total count, but that value cannot be considered as a unique id because it's only sure that it will eventually count all the increase or decrease operations you performed over it, so it can return the same value for different requests, even if each request involves an increase operation.

A similar approach can be designed that would have a shard of reserved id's pools, you would have to modify the code of the sharded counters but instead of sharding the increase or decrease of the counter the operation would randomly select among a predefined set of sharded pools of id's and from one of them return an id and make sure that it's not returned again. The pools for each shard can be number ranges that fit your particular needs, and you would have to manage the situation when one of the pools runs out of reserved ranges and you have to 'refill' it again.

I recommend taking a look at this article, to get some options to this kind of problem: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

  • You have a point, however what if the app does not call decrease operation at all, would there by a collision of count at all for all instances of the app? (Talking about GAE datastore) – xkm Jan 16 '15 at 07:41
  • the problem is not increasing or decreasing the counters, the thing is that by design, increasing or decreasing is disconnected from the get_count (that design allow high concurrency of counting operations in the different shards) so yes, you can have the same number return from a get_count, making it not usable as unique id's. You might not notice that situation with a low number of operations, but certainly will notice as you increase the clients trying to get a number from the sharded counter simultaneously. – Alejandro Santamaria Arza Jan 16 '15 at 19:44