0

I'm working with redash and I need to get user rows where for each row delta between date field is less than hour.

In more details: I need a session, user activity where it has some actions where end of the session defined by last action + 1 hour.

users row is <id, action, date>

user_id     page    happened_at     
179,233 rooms.view.step.content  2017-03-01 09:24
179,233 rooms.view.step.content  2017-03-01 09:01
179,233 rooms.student-showcase   2017-03-01 12:02

datediff should help there, but it is not available at redash - redshift.

I'm looking for alternatives. Anyone has thoughts there?

Gleichmut
  • 5,953
  • 5
  • 24
  • 32

1 Answers1

1

Please try this. You may even choose dateadd instead.

select id, action, date 
from users u1 
where exists 
      ( select 1 from users u2 
        where u1.id = u2.id 
        and u2.happened_at < (u1.happened_at + interval '1 hour') 
        and u2.happened_at > u1.happened_at )
union  
select id, action, date 
from users u1 
where exists 
      ( select 1 from users u2 
        where u1.id = u2.id 
        and u2.happened_at > (u1.happened_at + interval '1 hour') 
        and u2.happened_at < u1.happened_at )

By the way, redshift has datediff. Not sure why is it not supported in redash.

nitzien
  • 1,117
  • 9
  • 23
  • Got an error "select" LINE 7: ( select 1 from users as u2 ^. I'm trying to understand what isn't right here – Gleichmut Sep 21 '18 at 06:07
  • Also I learn more about 'select 1 from users u2 ' clause, it might be not a right choice - I need group of rows, not just one. This where clause is just a trigger where search should stop it work – Gleichmut Sep 21 '18 at 06:16
  • Please try now. Changed exist to exists. Also, this will not give one row. It will give group of rows except the last one. May be we can add union all to get last row too. – nitzien Sep 21 '18 at 06:22
  • 1
    Thanks, it gives right dataset (except lost last row). How do offer leverage union here? From my point of view we can add here only extra SELECT clause but even in this case it will break the final output. For now I just dropped extra WHERE clause – Gleichmut Sep 21 '18 at 06:42
  • Please try now. This should work. I am hoping there will be better answer. – nitzien Sep 21 '18 at 07:27
  • Thank you. What I proposed was using ascending order and omit second where clause 'and u2.happened_at > u1.happened_at'. It works and it has better performance\maintainability – Gleichmut Sep 21 '18 at 07:30
  • Removing second one will cause issue in following scenario 179,233 rooms.view.step.content 2017-03-01 09:26 179,233 rooms.view.step.content 2017-03-01 12:48 179,233 rooms.student-showcase 2017-03-01 12:52 179,233 rooms.student-showcase 2017-03-01 12:56 in this case only 2,3 (or 4th row should be in result). But if we remove second clause, it will give 1,2 and 3. – nitzien Sep 21 '18 at 07:33
  • Sorry. No way to put multi line in comments. So, formatting is spoilt. – nitzien Sep 21 '18 at 07:36
  • Where do you get this dataset? It is different from what I provided. Anyway I can't reproduce this issue on my dataset – Gleichmut Sep 21 '18 at 07:39
  • 1
    I just created some additional data points to create possible scenario. :) – nitzien Sep 21 '18 at 08:00
  • 1
    Thanks, I will explore it in details later – Gleichmut Sep 21 '18 at 08:02