-1

I have a query with the following result:

I thought I could use dense_rank to group user segment in a time interval. But it doesn't work.

CUST_ID EVENT_ID SEGMENT_ID SEGMENT_CODE       DATE_FROM              DATE_TO
100      1424      21            A         2011.01.05. 13:03:12   2011.01.06. 23:40:13
100      1566      21            A         2011.01.06. 23:40:13   2011.02.28. 11:48:52
100      1580      21            A         2011.02.28. 11:48:52   2012.04.30. 2:49:53
100      1601      45            Y         2012.04.30. 2:49:53    2012.05.29. 21:12:27
100      1663      45            Y         2012.05.29. 21:12:27   2012.05.30. 11:11:23
100      1710      45            Y         2012.05.30. 11:11:23   2012.08.01. 5:14:36
100      1875      114           H         2012.08.01. 5:14:36    2012.09.01. 20:26:42
100      1880      33            F         2012.09.01. 20:26:42   2012.09.03. 9:12:23
100      1901      21            A         2012.09.03. 9:12:23    2012.09.03. 9:12:23
100      1903      21            A         2012.09.03. 9:12:23    2012.10.25. 17:25:14
100      1966      223           R         2012.10.25. 17:25:14   2013.01.01. 1:12:55
100      2011      223           R         2013.01.01. 1:12:55    3500:12:31. 23:59:59

I'd like something like this:

CUST_ID EVENT_ID SEGMENT_ID SEGMENT_CODE       DATE_FROM              DATE_TO           DENSE_RANK
100      1424      21            A         2011.01.05. 13:03:12   2011.01.06. 23:40:13       1
100      1566      21            A         2011.01.06. 23:40:13   2011.02.28. 11:48:52       1
100      1580      21            A         2011.02.28. 11:48:52   2012.04.30. 2:49:53        1
100      1601      45            Y         2012.04.30. 2:49:53    2012.05.29. 21:12:27       2
100      1663      45            Y         2012.05.29. 21:12:27   2012.05.30. 11:11:23       2
100      1710      45            Y         2012.05.30. 11:11:23   2012.08.01. 5:14:36        2
100      1875      114           H         2012.08.01. 5:14:36    2012.09.01. 20:26:42       3
100      1880      33            F         2012.09.01. 20:26:42   2012.09.03. 9:12:23        4
100      1901      21            A         2012.09.03. 9:12:23    2012.09.03. 9:12:23        5
100      1903      21            A         2012.09.03. 9:12:23    2012.10.25. 17:25:14       5
100      1966      223           R         2012.10.25. 17:25:14   2013.01.01. 1:12:55        6
100      2011      223           R         2013.01.01. 1:12:55    3500:12:31. 23:59:59       6

Do you have any idea how can I solve this problem?

Ben
  • 51,770
  • 36
  • 127
  • 149
Shylock
  • 3
  • 1
  • what query have you tried so far? Something like `DENSE_RANK() OVER (PARTITION BY SEGMENT_ID ORDER BY DATE_FROM) "dense_rank"`? It is not clear what you intend by "user segment in a time interval"... – EmirCalabuch Jun 01 '13 at 12:01
  • What does "doesnt work" mean? What does your query look like? What's wrong with it? Are you getting an error? Incorrect data? What's the logic behind your output? Why is your output identical to your input? – Ben Jun 01 '13 at 12:30
  • Can't explain correct. But I will try. The query show one customer segment lifetime. – Shylock Jun 01 '13 at 12:47
  • It would be easy if one segment only appear one time. But if it isnt true the whole situation change. Like with segment A. If i use row_number or dense_rank and partion by cust_id, segment_id order by valid_to, valid_from. Order every row with segment_id 21 regardless the 21 segment appear two different time in the lifetime – Shylock Jun 01 '13 at 12:56
  • The logic behind my output is i'd like to get the first valid_from and the last valid_to in a segment period. I mean cust_id 100 was in A segment from 2011.01.05. 13:03:12 to 2012.04.30. 2:49:53. That's the first interval. But it was in A segment a second time from 2012.09.03. 9:12:23 to 2012.10.25. 17:25:14. – Shylock Jun 01 '13 at 13:03

1 Answers1

0

Based on your logic, dense_rank won't do. However, you can get a similar effect with lag() and a cumulative sum. The lag looks at the previous date value for the segment id/segment code, if they overlap, then the record is not the start of a grouping. Otherwise it is.

The cumulative sum of the "isstarts" is what you are looking for:

select t.*,
       sum(case when date_from - prev_date_to < 0.00001
                then 0
                else 1
           end) over (partition by segment_id order by SEGMENT_ID, SEGMENT_CODE
                     ) as YourRanking
from (select t.*,
             lag(date_to) over (partition by SEGMENT_ID, SEGMENT_CODE
                                order by date_from
                               ) as prev_date_to
      from t
     ) t

Because this is date/time, the query doesn't use exact equality. Instead it looks for a small difference.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786