Not for any specific case, just a question on how SQL software (e.g. Oracle) implements aggregate/analytic functions in order to help me write more efficient code.
Basically, I'm curious whether Oracle does anything smart about caching computed aggregate/analytic function values if they're referenced multiple times. For example, take the following two queries:
-- Query A
select
PRODUCT,
sum(SALES) as TOTSALES, -- (1)
log(sum(SALES)+1,10) as LOG_TOTSALES -- (2)
from EXAMPLE_TABLE
group by PRODUCT
order by sum(SALES); -- (3)
-- Query B
select
PRODUCT,
TOTSALES,
log(TOTSALES+1,10) as LOG_TOTSALES
from (
select PRODUCT, sum(SALES) as TOTSALES
from EXAMPLE_TABLE
group by PRODUCT
)
order by TOTSALES;
which accomplish the same thing. A naive implementation of Query A would evaluate the sum(SALES)
three times (twice in the select and once in the order by). Query B only evaluates it once in the inner query, and then calculates/orders by the explicit NUM column.
So the question is whether Oracle/MySQL/etc do anything smart under the hood for cases like Query A. I can imagine a case where, when the compiler encounters the first sum(SALES)
, it saves a new identifier. Then when it sees the log(sum(SALES)+1,10)
(for which it does not have an identifier) and tries to parse it, recognizes the sum(SALES)
already computed and looks up the cached value.
The same could work if the same analytic function is invoked > 1 times in a select, e.g.
select
DATE,
SALES as TODAY_SALES,
lag(SALES,1) over (order by DATE) as YEST_SALES,
SALES - (lag(SALES,1) over (order by DATE)) as DIFF_SALES
from DAILY_SALES;
where lag(SALES,1)
doesn't need to be computed twice if it is cached the first time. But I don't know if that would create other headaches in implementation, in which case I should be writing code like Query B to be more efficient.
Thanks!
Edit: replaced count(*)
with sum(SALES)
for a more illustrative example.