2

I have the following data:

Key Stage CreateDate
AAF 0     01-Jan-2018
AAF 0     02-Jan-2018
AAF 0     10-Jan-2018
AAF 20    20-Jan-2018
AAF 40    20-Mar-2018
AAF 0     01-May-2018
AAF 0     10-May-2018
AAF 0     20-May-2018
AAF 30    20-Jun-2018
AAF 0     20-Jul-2018   
AAF 100   20-Jul-2018       

I am basically trying to calculate the days spent at each stage.. I am currently taking the minimum date within each stage, and find the difference between the minimum date of the next stage:

select 
key,
stage,
cast(extract (day from max(next_dt) - min(createddate)) as number) as interval_days
from
(
select 
key,
stage,
createddate
lead(createddate,1) over (partition by  key order by createddate) next_dt
from  oppstages
)
group by key,stage 

As it can be seen, sometimes, the stage progresses from 0-40, but again goes back to 0. So the above logic doesn't work correctly and I am seeing a necessity to group the 0-40 as one category, and anything after 40 as next category and so on (if the stage decreases and restarts with new lesser stage number). The below query gives me the point where probability goes down, but I am not able flag to group the rows further.

select key,
stage,
createddate, 
next_dt,
next_prob,
case when   next_prob < stage  then 1   else 0 end as valid_flag,
from 
(
select 
key,
stage,
createddate,
lead(createddate,1) over (partition by  key order by createddate) next_dt, 
coalesce(lead(stage,1) over (partition by  key order by createddate),101) next_prob, 
from oppstages
) a

I expect this output so that I could group using flag to calculate the days spent at each instance:

Key Stage CreateDate    Flag
AAF 0     01-Jan-2018   1
AAF 0     02-Jan-2018   1
AAF 0     10-Jan-2018   1
AAF 20    20-Jan-2018   1
AAF 40    20-Mar-2018   1
AAF 0     01-May-2018   2
AAF 0     10-May-2018   2
AAF 0     20-May-2018   2
AAF 30    20-Jun-2018   2
AAF 10     20-Jul-2018   3
AAF 100   20-Jul-2018   3

thanks.

CuriP
  • 83
  • 10

2 Answers2

2

You can try to use lag window function get the Stage previous value.

Then use CASE WHEN check PREVAL > STAGE do increase 1.

CREATE TABLE T(
  Key varchar(50),
  Stage int,
  CreateDate date
);



INSERT INTO T VALUES ('AAF',0,TO_DATE('01-01-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',0,TO_DATE('02-01-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',0,TO_DATE('10-01-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',20,TO_DATE('20-01-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',40,TO_DATE('20-03-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',0,TO_DATE('01-05-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',0,TO_DATE('10-05-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',0,TO_DATE('20-05-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',30,TO_DATE('20-06-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',10,TO_DATE('20-07-2018','dd-mm-yyyy'));  
INSERT INTO T VALUES ('AAF',100,TO_DATE('20-07-2018','dd-mm-yyyy'));  

Query 1:

SELECT t1.KEY,
       t1.STAGE,
      (SUM(CASE WHEN PREVAL > STAGE THEN 1 ELSE 0 END) over (partition by Key order by CreateDate) + 1)  Flag
FROM (
  SELECT T.*,lag(Stage) over (partition by Key  order by CreateDate) preVAL
  FROM T 
)t1

Results:

| KEY | STAGE | FLAG |
|-----|-------|------|
| AAF |     0 |    1 |
| AAF |     0 |    1 |
| AAF |     0 |    1 |
| AAF |    20 |    1 |
| AAF |    40 |    1 |
| AAF |     0 |    2 |
| AAF |     0 |    2 |
| AAF |     0 |    2 |
| AAF |    30 |    2 |
| AAF |    10 |    3 |
| AAF |   100 |    3 |
D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • Hi thank you so much for this. Your solution works perfectly for the problem I listed above with an understanding that the stage begins with 0. Apologies for missing a key point.. the stage need not start necessarily from 0, and it just could be anything from that is lower than previous. I have updated the question with that possible scenario. Thank you. – CuriP Sep 18 '18 at 21:10
  • @CuriP Ok I see you can try my edit answer, only change the increase condition – D-Shih Sep 18 '18 at 21:15
1

You have a gaps-and-islands problem. A simple solution uses the difference of row numbers. This defines the group.

select t.*, (seqnum_2 - seqnum_1) as grp
from (select os.*,
             row_number() over (partition by key order by createdate) as seqnum,
             row_number() over (partition by key, stage order by createdate) as seqnum_2
      from oppstages os
     ) os;

What you probably want is an aggregation:

select key, stage, min(createdate), max(createdate),
       lead(min(createdate)) over (partition by key, stage, seqnum - seqnum_2 order by createdate) as next_creatdate
from (select os.*,
             row_number() over (partition by key order by createdate) as seqnum,
             row_number() over (partition by key, stage order by createdate) as seqnum_2
      from oppstages os
     ) os
group by key, stage, (seqnum_2 - seqnum)

I'm not sure what logic you want for the duration, but this should have all the information you need.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786