1

During a month a process inserts a large number of rows in some database tables ~1M.
This happens daily and the whole process lasts ~40mins. That is fine.
I created some "summary tables" from these inserts so as to query the data fast. This works fine.

Problem: I keep inserting data in the summary tables and so the time to create the cache table matches the process to insert the actual data and this is good. But if the data inserted in the previous days have changed (due to any updates) then I would need to "recalculate" the previous days and to solve this instead of creating today's summary data daily I would need to change my process to recreate the summary data from the beginning of each month which would mean my running time would increase substantially.
Is there a standard way to deal with this problem?

Jim
  • 18,826
  • 34
  • 135
  • 254

1 Answers1

0

We had a similar problem in our system, which we solved by generating a summary table holding each day's summary.

Whenever an UPDATE/INSERT changes the base tables the summary table is updated.. this will of course slow down these operations but keeps the summary table completely up to date.

This can be done using TRIGGERs, but as the operations are in one place, we just do it manually in a TRANSACTION.

One advantage of this approach is that there is no need to run a cron job to refresh/create the summary table.

I understand that this may not be applicable/feasible for your situation.

Arth
  • 12,789
  • 5
  • 37
  • 69