11

I read today about sharded counters in Google App Engine. The article says that you should expect to max out at about 5/updates per second per entity in the data store. But it seems to me that this solution doesn't 'scale' unless you have some way of knowing how many updates you are doing per second. For example, you can allocate 10 shards, but will then start choking at 50 updates per second.

So how do you know how fast the updates are coming, and how do you feed that number back into the number of shards?

My guess is that along with the counter you could keep some record of recent activity, and if you detect a spike you can increase the number of shards. Is that generally how it's done? And if so, why isn't it done in the sample code? (That last question may be unanswerable.) Is it more common practice to monitor website activity and update shard counts as traffic rises, as opposed to doing it automatically in the code?

Update: What are the practical consequences effects of having too few shards and choking? Does it simply mean that the website becomes unresponsive, or is it possible to lose counter updates because of timeouts?


As an aside, this question talks about implementing counters without sharding, but one of the answers impies that even memcache needs to be sharded if traffic is high. So this issue of shard allocation and tuning seems to be important.
Community
  • 1
  • 1
brainjam
  • 18,863
  • 8
  • 57
  • 82
  • It would be interesting to see how many updates per second the memcache approach could handle without sharding. (At the moment I can't seem to find any numbers about how fast you can update a given memcache key like this.) – David Underhill Jun 29 '10 at 23:43
  • I'm just learning about this, but isn't memcache unreliable in the sense that it can go poof anytime. – brainjam Jun 29 '10 at 23:59
  • Yep, memcache values can indeed be evicted at any time. Usually this happens due to memory pressure (though it could happen for other reasons - like memcache servers going down). That's one reason why memcache-based solutions could undercount a little. – David Underhill Jun 30 '10 at 00:08
  • I think the more relevant question is what is the downside if any, of choosing too many shards? slower performance when trying to actually get the current total? – Peter Recore Jun 30 '10 at 02:45
  • @Peter Recore: my understanding is that reading is fast, writing is slow. Also, the counter values are memcached for retrieval (but not update). – brainjam Jun 30 '10 at 02:52

3 Answers3

4

It is clearly simpler to manually monitor your website's popularity and increase the number of shards as needed. I would guess that most sites take this approach. Doing it programatically would not only be difficult, but it sounds like it would add an unacceptable amount of overhead to keep a record of all recent activity and try to analyze it to dynamically adjust the number of shards you're using.

I would prefer the simpler approach of just erring a little on the high side with the number of shards you choose.

You are correct about the practical consequences of having too few shards. Updating a datastore entity more frequently than possible which will initially cause some requests to take a long time (while the writes retry). If you have enough of them pile up, then they will start to fail as requests time out. This will certainly lead to missed counters. On the upside, your page will be so slow that users should start leaving which should relieve the pressure on the datastore :).

David Underhill
  • 15,896
  • 7
  • 53
  • 61
  • But but but .. if there have been timeouts, my counters will be wrong. I admit this will not result in any loss of life, but it bothers me just a little bit. Is it just one of those things that we have to live with? – brainjam Jun 29 '10 at 23:56
  • Living with the possibility of a few counter misses might not be so bad. Just try to choose the number of shards to accommodate your maximum expected peak traffic plus some safety margin. The more important missed counts are, the higher your safety margin should be. – David Underhill Jun 30 '10 at 00:03
3

To address the last part of your question: Your memcache values will not require sharding. A single memcache server can handle tens of thousands of QPS of fetches and updates, so no plausibly large app is going to need to shard its memcache keys.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
2

Why not add to the number of shards when Exceptions begin to occur?

Based on this GAE Example:

try{
  Transaction tx = ds.beginTransaction();
  // increment shard
  tx.commit();           
} catch(DatastoreFailureException e){
   // Datastore is struggling to handle the current load, increase it / double it
   addShards( getShardCount() );

} catch(DatastoreTimeoutException to){
   // Datastore is struggling to handle the current load, increase it / double it 
   addShards( getShardCount() );

} catch (ConcurrentModificationException cm){
   // Datastore is struggling to handle the current load, increase it / double it 
   addShards( getShardCount() );             

}
DougA
  • 1,213
  • 9
  • 17