1

I am attempting to change query 4 (Average Price for a category) from nexmark benchmark to produce an append only stream of average price for a category, using only flink sql as much as possible.
Query for the original description is available here in the nexmark benchmark repository.

This query produces an updating stream. I would like to instead produce an append only stream.
My approach was

  1. Do an interval join between auction and bid, group by bid and select the max price. This creates sell price for each item. This produces updates and deletes due to group by.
  2. Convert [1] to a windowed aggregate, in an attempt to convert [1] to an append only stream. Here I need a time attribute, and I am not sure how to propagate time attribute from [1] here.

To summarize, how can we propagate time attribute from an interval join, to do a window aggregation afterwards?

Here is the sql I used

WITH auction AS (
    Select dateTime SystemTime, auction.* from datagen2 where event_type = 1
),
bid AS(
    Select dateTime SystemTime, bid.* from datagen2 where event_type = 2
),
--Interval join.
MaxSellPricePerCategory AS (
select
    auction.id AuctionId,
    auction.category,
    max(bid.price) SellPrice,
    max(bid.SystemTime) BidSystemTime,
    max(bid.dateTime) LastBidTime
from
    auction
join
    bid
on
    auction.id = bid.auction
    --Add a condition on rowtime to make this a interval join
    and bid.SystemTime between auction.SystemTime and auction.SystemTime + INTERVAL '50' MINUTES
    --bid should be during the auction active time.
    and bid.dateTime between auction.dateTime and auction.expires
group by
    auction.id,
    auction.category
),
AverageSellPricePerWindow AS (
select
    window_start,
    category,
    AVG(SellPrice) AverageSellPriceInWindow
from
    TABLE(
        TUMBLE(TABLE MaxSellPricePerCategory, DESCRIPTOR(BidSystemTime), INTERVAL '1' MINUTES)
    )
GROUP BY
    window_start,
    window_end,
    category
)

select * from AverageSellPricePerWindow;

It fails with below

org.apache.flink.table.api.ValidationException: The window function requires the timecol is a time attribute type, but is TIMESTAMP(3).

Other approaches considered

  1. Use a window join. This requires the two stream's window to align (i.e. it requires equality condition on window_start and window_end). Business logic defines it as an overlapping interval.
  2. Use an upsert sink. This works, but was not my final goal.
Vignesh Chandramohan
  • 1,306
  • 10
  • 15
  • I think the issue here is not the join, but the `group by` in MaxSellPricePerCategory, which doesn't have a window_time. Max(bid.SystemTime) does not retain time attribute. – Vignesh Chandramohan Dec 23 '22 at 22:34

0 Answers0