3

I'm currently porting some work from MySQL to Google App Engine/Java. I'm using JDO, as well as the lower level java API where required.

I read through the optimization guide about sharding counters: http://code.google.com/appengine/articles/sharding_counters.html

I'm still building the foundation of my app. I know that premature optimization is the root of all evil; but this is clearly documented in order to avoid contention. So I'm having trouble deciding if I should be biased one way or the other.

So should I be sharding counters (and other possibly higher frequency write operation objects) by default, or should I go forward without sharding and implement on an as needed basis?

Dave
  • 6,141
  • 2
  • 38
  • 65
  • What are you going to do with those counters? I hope you'll not use them for something like auto increment ids for your entities. – cherouvim Sep 27 '11 at 04:50
  • Thanks for the concern, but no need to worry. I'm going to be counting things like page views and user actions. – Dave Sep 27 '11 at 14:07
  • Sounds good. Maybe use memcache and purge into db every 5 mins with cron? – cherouvim Sep 27 '11 at 14:15
  • 1
    Do you really need to count pageviews yourself, direct to the datastore? That's a recipe for scalability issues in any system, and the sort of thing tools like analytics were built for. – Nick Johnson Sep 28 '11 at 01:19
  • The exact situation I'm more worried about right now has to do with voting. I'm not so much worried about the speed of updating as I am about avoiding datastore contention. I'm afraid that if I had an artifact being updated due to having a lot of votes cast simultaneously that contention would cause vote(s) to be dropped. I realize that this might be overkill, but I don't have the benefit of hindsight here - so I'm taking a "better safe than sorry approach". – Dave Sep 28 '11 at 02:20
  • For pageviews, I probably will end up taking the memcache road as @chrouvim suggested (I realize I don't need %100 accuracy in that regard). Also worth noting; this isn't for professional use...well at least not yet ;) I'm really just trying to learn a lot and I'm biasing implementation decisions towards something that might be more complicated than I actually need for the sake of learning about it. Kind of getting restless/bored with work... – Dave Sep 28 '11 at 02:31

2 Answers2

4

The salient meaning of "premature" here is "before the proper time." Designing to avoid limits, when those limits are well understood, is not premature.

Shard your counters.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46
3

Even with effective sharding, maintaining aggregates can add some substantial load to your application. If you need that aggregate, and you can't afford an approximation; then using a sharded aggregate is not a premature optimization; there is no next best alternative. If you don't actually need the counter, then the time it will take to implement it could be better spent elsewhere.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304