In my current project we have written some map reduce jobs which in some cases could have been calculated at the point that relevant data was changed.
I'm curious as to whether there are some well accepted rules of thumb with regard to the cheapest time to perform an aggregate calculation.
I might start out like this:
- If you have no choice but to pass over all records, and the data changes frequently, then defer this to a batch process.
- For something which is expensive to calculate in whole but easy to increment, you should increment at write time.