0

I need to get sessions where I one sequence of values in specific order.

Now I have this query which returns sessions for each user from raw data

select user_id, page, happened_at 
from db as u1 
where  exists 
  (select 1 from db as u2 
   where u1.user_id = u2.user_id 
   and u2.happened_at < (u1.happened_at + interval '1 hour') )
ORDER BY user_id, happened_at
LIMIT 100

I need to get result set in format which is result of subquery on group of sessions I get by request above.

Subquery conditions is get a session where user opened the page in specific order. E.g. page value can be - start, work, buy, landing, - the user should go trough each page in this order during the session. He\she still can meet any other pages between each of them, but have to go through all this pages in this order during the session.

How is it possible to get the output which meet such condition? How can I get understanding the page value changes in specific order during the session?

Dataset on first query run:

    user_id     page    happened_at     
3,230 start 2017-03-01 15:10
3,230 work 2017-03-01 15:16
3,230 start 2017-03-01 15:16 
3,230 preview  2017-03-01 17:12
3,230 work 2017-03-01 17:12
3,230 buying 2017-03-01 17:13
3,230 landing 2017-03-01 17:51
3,230 smt else 2017-03-01 17:52
3,230 any page 2017-03-01 17:56
3,230 lanidng 2017-03-01 18:03

Output (what I am looking now)

    user_id     page    happened_at     
3,230 start 2017-03-01 15:16 
3,230 preview  2017-03-01 17:12
3,230 work 2017-03-01 17:12
3,230 buying 2017-03-01 17:13
3,230 landing 2017-03-01 17:51

Final result

user_id session_start session_end
3,230   2017-03-01 15:16 2017-03-01 18:03
Abdullah Ilgaz
  • 719
  • 1
  • 17
  • 39
Gleichmut
  • 5,953
  • 5
  • 24
  • 32

1 Answers1

0

use window function for 1st output

select user_id,page,happened_at from 
(
select user_id,page,happened_at, row_number() over(partition by user_id,page order by happened_at desc) rn from table
) t where rn=1

then take max() and min() of this query output

    with t1 as 

     (
    select user_id,page,happened_at from 
        (
        select user_id,page,happened_at, row_number() over(partition by user_id,page order by happened_at desc) rn from table
        ) t where rn=1
    ) select user_id,min(happened_at) session_start,
max(happened_at) as session_end from t1 group by user_id
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
  • Thanks, I have to learn more about 'window'. Here is the issue: session ends after one hour left since user made last action, at the sample there are actions divided by group each of them is session. In your code it doesn't recognise sessions and join all of them into the one. – Gleichmut Sep 21 '18 at 09:15
  • @Gleichmut now its matter of ordering of your event , you can customize ordering of your event like order by (case page when 'start' then 1 when 'preview' then 2 ... end ) as a result your event also ordered – Zaynul Abadin Tuhin Sep 21 '18 at 09:19
  • Could you please expand your thought? What I need is check if there are this values into the page during timeframe of the session and it should preserve the order – Gleichmut Sep 21 '18 at 09:32