I often face the situation where I need to compare aggregated data of different periods from the same source.
I usually deal with it this way:
SELECT
COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId) StoreId
, SalesThisYear.Sum_Revenue RevenueThisYear
, SalesLastYear.Sum_Revenue RevenueLastYear
FROM
(
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2017-09-01' AND '2017-09-30'
GROUP BY StoreId
) SalesThisYear
FULL JOIN (
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY StoreId
) SalesLastYear
ON (SalesLastYear.StoreId = SalesThisYear.StoreId)
-- execution time 337 ms
It is not very elegant in my opinion, because it visits the table twice, but it works.
Another similar way to achieve the same is:
SELECT
Sales.StoreId
, SUM(CASE YEAR(Date) WHEN 2017 THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE YEAR(Date) WHEN 2016 THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
-- execution time 548 ms
Both solutions performs almost the same on my data set (1,929,419 rows in the selected period, all indexes on their places), the first one a little better in time. And it doesn't matter if I include more periods, the first one is always better on my data set.
This is only a simple example but, sometimes, it involves more than two intervals and even some logic (e.g. compare isoweek/weekday instead of month/day, compare different stores, etc).
Although I already have figured out several ways to achieve the same, I was wondering if there is a clever way to achieve the same. Maybe a more cleaner solution, or a more suitable for big data sets (over a TB).
For example, I suppose the second one is less resource intensive for a big data set, since it does a single Index Scan over the table. The first one, on the other hand, requires two Index Scans and a Merge. If the table is too big to fit in memory, what will happen? Or the first one is always better?