0

I have the following events ordered by the time they happened:

e4 -> e2 -> e2 -> e3 -> e10 -> e4

How should I write PATTERN part of MATCH_RECOGNIZE to match record if e2 event happened and then e4 happened(e2 before e4) no matter if there are 0 or more other events between those two?

e4 -> e2 -> e2 -> e3 -> e10 -> e4 - matched
e4 -> e2 -> e4 - matched
e4 -> e4 -> e2 -> e3 - not matched
e2 -> e10 -> e2 -> e5 -> e4 - matched
datahack
  • 477
  • 1
  • 11
  • 32
  • are you wanting to match (greedy) `e2 -> e2 -> e3 -> e10 -> e4` or (minimal) `e2 -> e3 -> e10 -> e4` – Simeon Pilgrim Jan 24 '22 at 22:52
  • Sorry, what is the exact difference between greedy and minimal? – datahack Jan 24 '22 at 22:56
  • well those a standard pattern matching terms, and the example of the two was given in your context. greedy takes all in can, thus both e2 terms are in the match set, where-as minimal only takes/matches what it needs thus only one e2 term is matched. – Simeon Pilgrim Jan 24 '22 at 23:04

1 Answers1

1

so the four sequences, they can be minimal matched with:

WITH data AS (
    SELECT * FROM VALUES 
     (1,'e4',1),(1,'e2',2),(1,'e2',3),(1,'e3',4),(1,'e10',5),(1,'e4',6),
     (2,'e4',1),(2,'e2',2),(2,'e4',3),
     (3,'e4',1),(3,'e4',2),(3,'e2',3),(3,'e3',4),
     (4,'e2',1),(4,'10',2),(4,'e2',3),(4,'e5',4),(4,'e4',5)
)
SELECT * FROM data 
match_recognize(
    partition by column1
    order by column3
    measures
        match_number() as "MATCH_NUMBER",
        match_sequence_number() as msq,
        classifier() as cl
    all rows per match with unmatched rows
    PATTERN (d1 d2* d3)
    DEFINE d1 as column2 = 'e2',
        d2 as column2 NOT IN ('e2','e4'),
        d3 as column2 = 'e4'
)
ORDER BY 1,3;

giving:

COLUMN1 COLUMN2 COLUMN3 MATCH_NUMBER MSQ CL
1 e4 1
1 e2 2
1 e2 3 1 1 D1
1 e3 4 1 2 D2
1 e10 5 1 3 D2
1 e4 6 1 4 D3
2 e4 1
2 e2 2 1 1 D1
2 e4 3 1 2 D3
3 e4 1
3 e4 2
3 e2 3
3 e3 4
4 e2 1
4 10 2
4 e2 3 1 1 D1
4 e5 4 1 2 D2
4 e4 5 1 3 D3

But given you said you want "matches", then maybe you just want the range details, thus:

SELECT * FROM data 
match_recognize(
    partition by column1
    order by column3
    measures
        first_value(column1) as batch,
        first_value(column3) as seq_start,
        last_value(column3) as seq_end,
        match_number() as "MATCH_NUMBER",
        match_sequence_number() as msq,
        classifier() as cl
    one row per match
    PATTERN (d1 d2* d3)
    DEFINE d1 as column2 = 'e2',
        d2 as column2 NOT IN ('e2','e4'),
        d3 as column2 = 'e4'
)
ORDER BY 1,3;

might be what you are after:

COLUMN1 BATCH SEQ_START SEQ_END MATCH_NUMBER MSQ CL
1 1 3 6 1 4 D3
2 2 2 3 1 2 D3
4 4 3 5 1 3 D3
Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45