I have following table that contains quantities of items per day.
ID Date Item Count
-----------------------------
1 2022-01-01 Milk 10
2 2022-01-11 Milk 20
3 2022-01-12 Milk 10
4 2022-01-15 Milk 12
5 2022-01-16 Milk 10
6 2022-01-02 Bread 20
7 2022-01-03 Bread 22
8 2022-01-05 Bread 24
9 2022-01-08 Bread 20
10 2022-01-12 Bread 10
I want to aggregate (sum, avg, ...) the quantity per item for the last 7 days (or 14, 28 days). The expected outcome would look like this table.
ID Date Item Count Sum_7d
-------------------------------------
1 2022-01-01 Milk 10 10
2 2022-01-11 Milk 20 20
3 2022-01-12 Milk 10 30
4 2022-01-15 Milk 12 42
5 2022-01-16 Milk 10 52
6 2022-01-02 Bread 20 20
7 2022-01-03 Bread 22 42
8 2022-01-05 Bread 24 66
9 2022-01-08 Bread 10 56
10 2022-01-12 Bread 10 20
My first approach was using Redshift window functions like this
SELECT *, SUM(Count) OVER (PARTITION BY Item
ORDER BY Date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS Sum_7d
FROM my_table
but it does not give the expected results because there are missing dates and I could not figure out how to put a condition on the time range.
My fallback solution is a cross product, but that's not desirable because it is inefficient for large data.
SELECT l.Date, l.Item, l.Count, sum(r.Count) as Sum_7d
FROM my_table l,
my_table r
WHERE l.Date - r.Date < 7
AND l.Date - r.Date >= 0
AND l.Item = r.Item
GROUP BY 1, 2, 3
Is there any efficient and concise way to do such an aggregation on date ranges in Redshift?
Related: