0

I am working on building a query to track the life cycle of an user through the platform via events. The table EVENTS has 3 columns USER_ID, DATE_TIME and EVENT_NAME. Below is a snapshot of the table,

enter image description here

Below is my query,

SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
    MEASURES MIN(IFF(EVENT_NAME = 'registration new', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'registration pending confirm', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your business information', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your personal information', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'qualified', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
  ONE ROW PER MATCH
  PATTERN(STEP_1 ANYTHING* STEP_5)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS EVENT_NAME = 'registration pending confirm',
        STEP_3 AS EVENT_NAME = 'your business information',
        STEP_4 AS EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
)

My expected result,

enter image description here enter image description here enter image description here

What I am getting right now,

enter image description here enter image description here enter image description here

Below are my requirements/caveats,

  • The timestamp of the following event should be greater than or equal to the timestamp of the previous event (whichever comes first and so that the timestamps are equal or keep increasing for the events through the funnel). A good example of this logic can explained by the difference in the current and expected result i.e. with the values in the REGISTRATION_PENDING_CONFIRM_TIMESTAMP and QUALIFIED_TIMESTAMP column.
  • Not all users have all these 5 events, for example if USER_ID 54321 doesn't have/skips the event 'your personal information' the result must have the data for the rest of the steps (right now if the user doesn't have/skips any event in the funnel no data is returned by the query). I feel that this is because the pattern search fails when the event defined as a measure is missing in the user flow.

The order of events is not consistent in the table so I have defined the events in order in the measures section as per the business/funnel logic

Jude92
  • 167
  • 2
  • 6
  • 20
  • Hey Jude, did you manage to find the solution to this problem, and could you be so nice explaining how you did it? Thnx! – datahack Dec 15 '21 at 12:49

1 Answers1

1

This is not a complete answer, but at least I'm helping here to define the sample data (better than just a screenshot), and introduce the usage of CLASSIFIER:

create or replace temp table events as
select $1 user_id, $2 date_time, $3 event_name
from values(1,'2020-11-26 15:24:00','registration new')
, (1,'2021-04-12 18:00:00','registration new')
, (1,'2020-11-26 15:24:00','registration pending confirm')
, (1,'2021-04-12 18:11:00','registration pending confirm')
, (1,'2021-04-18 15:04:00','your personal information')
, (1,'2021-04-22 13:13:00','your personal information')
, (1,'2021-04-13 10:22:00','qualified')
, (1,'2021-04-22 13:13:00','qualified')
;


SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
 
    MEASURES  classifier as class, MIN(IFF(CLASSIFIER = 'STEP_1', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_2', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_3', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_4', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_5', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
 
  ONE ROW PER MATCH
 -- all rows per match
  PATTERN((step_1 | step_2 | step_3 | step_4 | step_5 | coincidence)*)--(STEP_2 | XX)* (STEP_3 | XXX)* (STEP_4 | XX)* (STEP_5 | XX)*)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'registration pending confirm' ,
        STEP_3 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your business information',
        STEP_4 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
        , COINCIDENCE AS LAG(DATE_TIME) = DATE_TIME
);
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325