1

I am working in a project where i need to calculate some avg values based on the users interaction on a site.

Now, the amount of records that needs to have their total avg calculated can range from a few to thousands.

My question is, at which threshold would it be wise to store the aggregated data in a seperate table and through a store procedure update that value everytime a new record is generated instead of just calculate it everytime it is neede?

Thanks in advance.

Rasmus
  • 177
  • 9
  • Are you having problems with performance? I'd try to optimize my queries first before doing stuff like this. – jsalonen Jul 29 '13 at 12:43
  • 1
    Your question can't be answered unless you want personal opinions, there's no threshold as such, you engineer things according to need and you compensate for time with extra space etc. Your question rises too many other questions which are also going to be opinion based. – N.B. Jul 29 '13 at 12:44
  • It depends on a lot of specifics, including such things as what are acceptable response times so far as your users are concerned. – Damien_The_Unbeliever Jul 29 '13 at 12:44
  • It would depend greatly on the read/write distribution of the database. Updating multiple tables per write thousands of times per second isn't necessarily good, even if done through a stored proc. – Joachim Isaksson Jul 29 '13 at 12:45
  • Right, don't fix it until there actually is a problem, makes sense – Rasmus Jul 29 '13 at 12:50

2 Answers2

1

You need to weigh the need of current data vs the need of quick data. If you absolutely need current data then you have to live with longer delays in your queries. If you absolutely need your data asap then you will have to deal with older data.

You can time your queries and time the insertion into a separate table and evaluate which seems to best fit your needs.

ApplePie
  • 8,814
  • 5
  • 39
  • 60
1

Dont do it, until you start having performance problems caused by the time it takes to aggregate your data.

Then do it.

If discovering this bottleneck in production is unacceptable, then run the system in a test environment that accurately matches your production environment and load in test data that accurately matches production data. If you hit a performance bottleneck in that environment that is caused by aggregation time, then do it.

Dancrumb
  • 26,597
  • 10
  • 74
  • 130