Consider this table that stores the value of two stock variables A and B at each point in time:
A B
day 1 10 0
day 2 0 10
day 3 7 7
day 4 7 7
We want to answer questions like:
What was the maximum value achieved by variable A in a given range of days?
What was the maximum value achieved by the sum of variables A and B in a given range of days?
The actual table might have billions of rows and many variables, however. In order to get the answers faster, we plan to precompute a summary table with lower time granularity.
The problem is that naively calculating the maximum across the new temporal granularity for A and B separately is not enough for answering the second question. For example:
Max-A Max-B
day 1&2 10 10
day 3&4 7 7
We have lost the fact that the maximum of A + B is achieved across days 3 & 4.
We can add a new Max-(A+B) column to the summary table. But if there are many different variables, we will face a combinatorial explosion. The summary table might end up being bigger than the original one!
Is there an algorithm / data structure for efficiently storing these kinds of precomputed maximums, in a way that lets us make questions about arbitrary combinations of variables, all the while avoiding a combinatorial explosion? My guess is that it could assume some regularities in the data and try to exploit them—at the cost of some generality.