Flink SQL Match_Recognize giving incomplete results

Question

I have the following data given to Flink as a stream

ID  Val eventTime.rowtime
266 25  9000
266 22  10000
266 19  11000
266 18  12000
266 16  13000
266 15  14000
266 14  15000
266 13  16000
266 14  17000
266 15  18000
266 17  19000
266 18  20000
266 18  21000
266 19  22000
266 21  23000
266 21  24000
266 21  25000
266 22  26000
266 21  27000
266 21  28000
266 22  29000
266 24  30000
266 23  31000
266 24  32000
266 25  33000
266 24  34000
266 22  35000
266 23  36000
266 24  37000
266 19  38000

I need to run an SQL match recognize as follows

Select ID, sts, ets, intervalValue,valueDescription, intvDuration from         
RawEvents Match_Recognize (
PARTITION BY ID
ORDER BY eventTime
MEASURES
A.ID AS id,
FIRST(A.eventTime) As sts,
LAST(A.eventTime) As ets,
MAX(A.val) As intervalValue,
'max' As valueDescription,
TIMESTAMPDIFF(SECOND, FIRST(A.eventTime), LAST(A.eventTime)) As 
intvDuration 
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A+ B)
DEFINE
A as A.val>=20,
B As true)

I expect the output to include intervals like

(266,1970-01-01 00:00:09.0,1970-01-01 00:00:10.0,25.0,max,1)
(266,1970-01-01 00:00:10.0,1970-01-01 00:00:10.0,22.0,max,0)
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:23.0,22.0,max,0)
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:24.0,22.0,max,0)
...
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:37.0,22.0,max,0)
...
(266,1970-01-01 00:00:37.0,1970-01-01 00:00:37.0,22.0,max,0)

but what I actually get is the first two to records only

Below is my full code to convert the stream into a table and back the query result to a stream

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
    env.getConfig().setAutoWatermarkInterval(10);


DataStream<String> stringStream = env.addSource(new 
LinearRoadSource("C:\\Work\\Data\\linear.csv"));
    DataStream<SpeedEvent> speedStream = stringStream.map(new 
SpeedMapper()).setParallelism(1);
    speedStream = speedStream.assignTimestampsAndWatermarks(new 
AssignerWithPeriodicWatermarks<SpeedEvent>() {
        private long maxTimestampSeen = 0;

        @Override
        public Watermark getCurrentWatermark() {
            return new Watermark(maxTimestampSeen);
        }

        @Override
        public long extractTimestamp(SpeedEvent temperatureEvent, long l) 
{
            long ts = temperatureEvent.getTimestamp();
           // if (temperatureEvent.getKey().equals("W"))
                maxTimestampSeen = Long.max(maxTimestampSeen,ts);
            return ts;
        }
    }).setParallelism(1);

TupleTypeInfo<Tuple3<String, Double, Long>> inputTupleInfo = new 
TupleTypeInfo<>(
            Types.STRING(),
            Types.DOUBLE(),
            Types.LONG()
    );


StreamTableEnvironment tableEnv = 
StreamTableEnvironment.getTableEnvironment(env);
    tableEnv.registerDataStream("RawEvents",
            keyedStream.map((MapFunction<SpeedEvent, Tuple3<String, 
Double, Long>>) event -> new Tuple3<>(event.getKey(), event.getValue(), 
event.getTimestamp())).returns(inputTupleInfo),
            "ID, val, eventTime.rowtime"
    );

Table intervalResult = tableEnv.sqlQuery("Select ID, sts, ets, intervalValue,valueDescription, intvDuration from         
RawEvents Match_Recognize (
PARTITION BY ID
ORDER BY eventTime
MEASURES
A.ID AS id,
FIRST(A.eventTime) As sts,
LAST(A.eventTime) As ets,
MAX(A.val) As intervalValue,
'max' As valueDescription,
TIMESTAMPDIFF(SECOND, FIRST(A.eventTime), LAST(A.eventTime)) As 
intvDuration 
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A+ B)
DEFINE
A as A.val>=20,
B As true)");

TupleTypeInfo<Tuple6<String, Timestamp, Timestamp, Double, String, 
Integer>> tupleTypeInterval = new TupleTypeInfo<>(
            Types.STRING(),
            Types.SQL_TIMESTAMP(),
            Types.SQL_TIMESTAMP(),
            Types.DOUBLE(),
            Types.STRING(),
            Types.INT()
    );

DataStream<Tuple6<String, Timestamp, Timestamp, Double, String, Integer>> 
queryResultAsStream = tableEnv.toAppendStream(intervalResult,    tupleTypeInterval);
queryResultAsStream.print();

Would there be anything wrong that I've done or something that I forgot to do?

I am using Flink 1.8.1.

First of all could you update your example to the actual code you are using? In the `tableEnv.registerDataStream` you use a `keyedStream` which is not defined anywhere in you example. Secondly A+ is a greedy operator. It tries to assign as many events as possible. Therefore the output you expect is wrong. There should be 2 output rows with intervals in millis ([9000-10000], [23000-37000]). If you want to use a reluctant quantifier add a `?` operator. — Dawid Wysakowicz, Aug 27 '19 at 07:24
@DawidWysakowicz. First thanks for the comment. I am not able yet to edit my question. The missing part just derives a keyed stream by splitting on a key, I don't think this would make a significant difference. For the second point, if that is right, why would I get the results [9000-10000] while I am still with the greedy operator + — Ahmed Awad, Aug 27 '19 at 08:03
Because 9000 and 10000, have `val >=20` and the third 11000 has `val < 20`, which is your pattern. — Dawid Wysakowicz, Aug 27 '19 at 09:23
Sorry, I meant I get the result [10000, 10000] in addition to [9000, 10000] @DawidWysakowicz — Ahmed Awad, Aug 27 '19 at 10:50
Yes, because you have the SKIP_TO_NEXT strategy defined, therefore you can allow for pattern to start at the next event that arrived after the event that started previous pattern. I forgot about that part. Will check the expected results one more time later. — Dawid Wysakowicz, Aug 27 '19 at 11:58
Ok I checked the expected results once again. They are still wrong but slightly differently ;) The expected results should be ([9000-10000], [10000-10000], [23000-37000], [24000-37000], [25000-37000], ...., [37000-37000]). I am not sure why you don't get the second part of the results (starting at 23000). Could you check if the Watermark properly advances? — Dawid Wysakowicz, Aug 29 '19 at 11:20
Also it would be super helpful if you could provide a self contained example that I could run. You could post it e.g. on the flink user mailing list. — Dawid Wysakowicz, Aug 29 '19 at 11:23

Flink SQL Match_Recognize giving incomplete results

0 Answers0