1

I have a use case where I have 2 input topics in kafka. Topic schema: eventName, ingestion_time(will be used as watermark), orderType, orderCountry

Data for first topic: {"eventName": "orderCreated", "userId":123, "ingestionTime": "1665042169543", "orderType":"ecommerce","orderCountry": "UK"}

Data for second topic: {"eventName": "orderSucess", "userId":123, "ingestionTime": "1665042189543", "orderType":"ecommerce","orderCountry": "USA"}

I want to get all the userid for orderType,orderCountry where user does first event but not the second one in a window of 5 minutes for a maximum of 2 events per user for a orderType and orderCountry (i.e. upto 10 mins only).

I have union both topics data and created a view on top of it and trying to use flink cep sql to get my output, but somehow not able to figure it out.

SELECT *
FROM union_event_table
    MATCH_RECOGNIZE(
        PARTITION BY orderType,orderCountry
        ORDER BY ingestion_time
        MEASURES
            A.userId as userId
            A.orderType as orderType
            A.orderCountry AS orderCountry
        ONE ROW PER MATCH
        PATTERN (A not followed B) WITHIN INTERVAL '5' MINUTES
        DEFINE
            A As A.eventName = 'orderCreated'
            B AS B.eventName = 'orderSucess'
    )

First thing is not able to figure it out what to use in place of A not followed B in sql, another thing is how can I restrict the output for a userid to maximum of 2 events per orderType and orderCountry, i.e. if a user doesn't perform 2nd event after 1st event in 2 consecutive windows for 5 minutes, the state of that user should be removed, so that I will not get output of that user for same orderType and orderCountry again.

1 Answers1

0

I don't believe this is possible using MATCH_RECOGNIZE. This could, however, be implemented with the DataStream CEP library by using its capability to send timed out patterns to a side output.

This could also be solved at a lower level by using a KeyedProcessFunction. The long ride alerts exercise from the Apache Flink Training repo is an example of that -- you can jump straight away to the solution if you want.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • if i remove the condition of restrict output and if I want all output for A not followed by B, will it be possible in sql ? – user9068199 Oct 06 '22 at 10:19
  • The WITHIN clause makes it plausible that it would work, but I don't believe it does. Without the WITHIN clause it becomes completely intractable. – David Anderson Oct 06 '22 at 10:28
  • does anti pattern is not available in cep, is there no syntax of A not followed by B – user9068199 Oct 12 '22 at 09:01