I am attempting to change query 4 (Average Price for a category) from nexmark benchmark to produce an append only stream of average price for a category, using only flink sql as much as possible.
Query for the original description is available here in the nexmark benchmark repository.
This query produces an updating stream. I would like to instead produce an append only stream.
My approach was
- Do an interval join between auction and bid, group by bid and select the max price. This creates sell price for each item. This produces updates and deletes due to group by.
- Convert [1] to a windowed aggregate, in an attempt to convert [1] to an append only stream. Here I need a time attribute, and I am not sure how to propagate time attribute from [1] here.
To summarize, how can we propagate time attribute from an interval join, to do a window aggregation afterwards?
Here is the sql I used
WITH auction AS (
Select dateTime SystemTime, auction.* from datagen2 where event_type = 1
),
bid AS(
Select dateTime SystemTime, bid.* from datagen2 where event_type = 2
),
--Interval join.
MaxSellPricePerCategory AS (
select
auction.id AuctionId,
auction.category,
max(bid.price) SellPrice,
max(bid.SystemTime) BidSystemTime,
max(bid.dateTime) LastBidTime
from
auction
join
bid
on
auction.id = bid.auction
--Add a condition on rowtime to make this a interval join
and bid.SystemTime between auction.SystemTime and auction.SystemTime + INTERVAL '50' MINUTES
--bid should be during the auction active time.
and bid.dateTime between auction.dateTime and auction.expires
group by
auction.id,
auction.category
),
AverageSellPricePerWindow AS (
select
window_start,
category,
AVG(SellPrice) AverageSellPriceInWindow
from
TABLE(
TUMBLE(TABLE MaxSellPricePerCategory, DESCRIPTOR(BidSystemTime), INTERVAL '1' MINUTES)
)
GROUP BY
window_start,
window_end,
category
)
select * from AverageSellPricePerWindow;
It fails with below
org.apache.flink.table.api.ValidationException: The window function requires the timecol is a time attribute type, but is TIMESTAMP(3).
Other approaches considered
- Use a window join. This requires the two stream's window to align (i.e. it requires equality condition on window_start and window_end). Business logic defines it as an overlapping interval.
- Use an upsert sink. This works, but was not my final goal.