I have a large dataset and want to retrieve previous and subsequent observations for small windows. Below is a snippet around a window of interest (2024-09-23 to 2024-09-27).
item date price
A 2024-09-20 $106.59
A 2024-09-23 $91.84
A 2024-09-24 $92.66
A 2024-09-25 $107.87
A 2024-09-26 $98.47
A 2024-09-27 $91.65
A 2024-09-30 $108.10
B 2024-07-29 $120.71
B 2024-08-13 $87.80
B 2024-09-04 $115.80
B 2024-09-26 $94.06
B 2024-09-27 $120.88
In the example above, "A" had a price every weekday, but "B" had infrequent prices. As expected (but not desired), the code below produces NULL values at the beginning and end of the desired window (specified in the WHERE statement).
SELECT item, date, price,
lag(price) OVER (PARTITION BY item ORDER BY date) as prevprice,
lead(price) OVER (PARTITION BY item ORDER BY date) as nextprice
FROM dailytable
WHERE date BETWEEN date'2024-09-23' AND date'2024-09-27'
;
Results - Not Wanted
item date price prevprice nextprice
A 2024-09-23 $91.84 NULL $92.66
A 2024-09-24 $92.66 $91.84 $107.87
A 2024-09-25 $107.87 $92.66 $98.47
A 2024-09-26 $98.47 $107.87 $91.65
A 2024-09-27 $91.65 $98.47 NULL
B 2024-09-26 $94.06 NULL $120.88
B 2024-09-27 $120.88 $94.06 NULL
Obviously, I could run the LEAD() and LAG() over the entire dataset in a subquery, then select from that using the WHERE statement, as below.
SELECT *
FROM
( SELECT item, date, price,
lag(price) OVER (PARTITION BY item ORDER BY date) as prevprice,
lead(price) OVER (PARTITION BY item ORDER BY date) as nextprice
FROM dailytable
)
WHERE date BETWEEN date'2024-09-23' AND date'2024-09-27'
;
Desired Results
item date price prevprice nextprice
A 2024-09-23 $91.84 $106.59 $92.66
A 2024-09-24 $92.66 $91.84 $107.87
A 2024-09-25 $107.87 $92.66 $98.47
A 2024-09-26 $98.47 $107.87 $91.65
A 2024-09-27 $91.65 $98.47 $108.10
B 2024-09-26 $94.06 $115.80 $120.88
B 2024-09-27 $120.88 $94.06 NULL
However, this is very costly and takes a long time. The entire dataset has over 50 years of data, but the required lead and lag periods are never nearly that long.
Is there a way to get the previous and subsequent prices for the window without running a subquery that runs over the entire history of the dataset? Of course, if an item doesn't have a next price, like B in the above example, I don't expect magic; NULL is appropriate.