I am trying to figure out how to query a table (Hive 3) of dates and values with a moving temporal window aggregation. In the example below, I want to collect every two-day window possible (in which non-terminal dates would each be used twice).
Sample data
first_date | value |
---|---|
2020-01-01 | 3 |
2020-01-02 | 4 |
2020-01-03 | 5 |
2020-01-04 | 6 |
Desired output (combining every two-day window)
first_date | total |
---|---|
2020-01-01 | 7 |
2020-01-02 | 9 |
2020-01-03 | 11 |
2020-01-04 | 6 |
I have tried something like this with no luck
select
first_date,
sum(value) over(
partition by first_date
range between first_date and first_date + interval '1' day
) as total
Apparently I am not allowed to use the partition column (dates) in the range clause, which is somewhat inconvenient. I may be able to duplicate the date column to get around this restriction, but there is probably a better way. What else could I try to get this working?
(Also in practice we may have many instances of any single date, so trying something like counting adjacent rows would not be reliable)