1

I am working on building a query to track the life cycle of an user through the platform via events. The table EVENTS has 3 columns USER_ID, DATE_TIME and EVENT_NAME. Below is a snapshot of the table,

enter image description here

My query should return the below result (the first timestamp for the registered event followed by the immediate/next timestamp of the following log_in event and finally followed by the immediate/next timestamp of the final landing_page event),

enter image description here

Below is my query ,

WITH FIRST_STEP AS
(SELECT 
USER_ID,
MIN(CASE WHEN EVENT_NAME = 'registered' THEN DATE_TIME ELSE NULL END) AS REGISTERED_TIMESTAMP
FROM EVENTS
GROUP BY 1
),
SECOND_STEP AS
(SELECT * FROM EVENTS
WHERE EVENT_NAME = 'log_in'
ORDER BY DATE_TIME
),
THIRD_STEP AS
(SELECT * FROM EVENTS
WHERE EVENT_NAME = 'landing_page'
ORDER BY DATE_TIME
)
SELECT
a.USER_ID,
a.REGISTERED_TIMESTAMP,
(SELECT
CASE WHEN b.DATE_TIME >= a.REGISTRATIONS_TIMESTAMP THEN b.DATE_TIME END AS LOG_IN_TIMESTAMP
FROM SECOND_STEP
LIMIT 1
),
(SELECT
CASE WHEN c.DATE_TIME >= LOG_IN_TIMESTAMP THEN c.DATE_TIME END AS LANDING_PAGE_TIMESTAMP
FROM THIRD_STEP
LIMIT 1
)
FROM FIRST_STEP AS a
LEFT JOIN SECOND_STEP AS b ON a.USER_ID = b.USER_ID
LEFT JOIN THIRD_STEP AS c ON b.USER_ID = c.USER_ID;

Unfortunately I am getting the "SQL compilation error: Unsupported subquery type cannot be evaluated" error when I try to run the query

Jude92
  • 167
  • 2
  • 6
  • 20

1 Answers1

1

This is a perfect use case for MATCH_RECOGNIZE.

The pattern you are looking for is register anything* login anything* landing and the measures are the min(iff(event_name='x', date_time, null)) for each.

Check:

Set the output to one row per match.

Untested sample query:

select *
from data
match_recognize(
    partition by user_id
    order by date_time
    measures min(iff(event_name='register', date_time, null)) as t1
      , min(iff(event_name='log_in', date_time, null)) as t2
      , min(iff(event_name='landing_page', date_time, null)) as t3
    one row per match
    pattern(register anything* login anything* landing)
    define
        register as event_name = 'register'
        , login as event_name = 'log_in'
        , landing as event_name = 'landing_page'
);
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • SELECT * FROM EVENTS MATCH_RECOGNIZE( PARTITION BY USER_ID ORDER BY DATE_TIME MEASURES MIN(IFF(EVENT_NAME='registered', DATE_TIME, NULL)) AS REGISTER_TIMESTAMP, MIN(IFF(EVENT_NAME='log_in', DATE_TIME, NULL)) AS LOG_IN_TIMESTAMP, MIN(IFF(EVENT_NAME='landing_page', DATE_TIME, NULL)) AS LANDING_PAGE_TIMESTAMP ONE ROW PER MATCH PATTERN(STEP_1 STEP_2 STEP_3) DEFINE STEP_1 AS EVENT_NAME='registered', STEP_2 AS EVENT_NAME='log_in', STEP_3 AS EVENT_NAME='landing_page' ); – Jude92 May 20 '21 at 01:31
  • Is there a way that I can modify the PATTERN and DEFINE logic or the MEASURES logic in the query to achieve what I want from my table (to return the first timestamp for the first event, followed by the immediate/next timestamp of the second event, followed by the immediate/next timestamp of the third event and so on in the funnel)? – Jude92 May 20 '21 at 01:50
  • The number of records in my result set keep changing when I run my query each time – Jude92 May 20 '21 at 03:50
  • And how can we deal with missing events in a user's flow? if any any event is missing in the user flow through the funnel the query doesn't return any result as the pattern search fails when the event defined is missing in the user flow – Jude92 May 20 '21 at 04:31
  • Post a new question with sample data and desired results, and you'll get the answers that the questions ask – Felipe Hoffa May 20 '21 at 05:28
  • Thanks! Below is the link to the question, https://stackoverflow.com/questions/67626707/funnel-analytics-using-sql-in-snowflake – Jude92 May 20 '21 at 19:29