1

I am trying to create a report and this is my input data.

  Stage  Name         Date
    1     x     12/05/2019 10:00:03
    1     x     12/05/2019 10:05:01
    1     y     12/06/2019 12:00:07
    2     x     12/06/2019 13:12:03
    2     x     12/06/2019 13:23:00
    1     y     12/08/2019 16:00:07
    2     x     12/09/2019 09:17:59  

This is my desired output.

  Stage  Name       DateFrom               DateTo            DateDiff
    1     x     12/05/2019 10:00:03    12/06/2019 12:00:07        1
    1     y     12/06/2019 12:00:07    12/06/2019 13:12:03        0
    2     x     12/06/2019 13:12:03    12/08/2019 16:00:07        2
    1     y     12/08/2019 16:00:07    12/09/2019 09:17:59        1

I cannot use group by clause over stage and name, since it will group the 3rd and 6th rows from my input. I tried joining the table to itself, but I am not getting the desired result. Is this even possible in SQL ? Any ideas would be helpful. I am using Microsoft SQL Server.

GMB
  • 216,147
  • 25
  • 84
  • 135
Xion
  • 318
  • 1
  • 5
  • 19
  • This is a typical "Gaps & Islands" problem where the island partition is defined by the tuple (stage, name). Should be closed as duplicate. – The Impaler Feb 06 '20 at 21:55
  • @TheImpaler, I agree, but I don't have a good canonical gaps & islands question to use as a base. I don't suppose you have one bookmarked? – Eric Brandt Feb 06 '20 at 21:59

2 Answers2

1

This is a variation of the gaps and island problem. You want to group together groups of adjacent rows (ie having the same stage and name); but you want to use the start date of the next group as ending date for the current group.

Here is one way to do it:

select 
    stage,
    name,
    min(date) date_from,
    lead(min(date)) over(order by min(date)) date_to,
    datediff(day, min(date), lead(min(date)) over(order by min(date))) date_diff 
from (
    select
        t.*,
        row_number() over(order by date) rn1,
        row_number() over(partition by stage, name order by date) rn2
    from mytable t
) t
group by stage, name, rn1 - rn2
order by date_from

Demo on DB Fiddle:

stage | name | date_from           | date_to             | datediff
----: | :--- | :------------------ | :------------------ | -------:
    1 | x    | 12/05/2019 10:00:03 | 12/06/2019 12:00:07 |        1
    1 | y    | 12/06/2019 12:00:07 | 12/06/2019 13:12:03 |        0
    2 | x    | 12/06/2019 13:12:03 | 12/08/2019 16:00:07 |        2
    1 | y    | 12/08/2019 16:00:07 | 12/09/2019 09:17:59 |        1
    2 | x    | 12/09/2019 09:17:59 | null                |     null

Note that this does not produce exactly the result that you showed: there is an additional, pending record at the end of the resultset, that represents the "on-going" series of records. If needed, you can filter it out by nesting the query:

select *
from ( 
    select 
        stage,
        name,
        min(date) date_from,
        lead(min(date)) over(order by min(date)) date_to,
        datediff(day, min(date), lead(min(date)) over(order by min(date))) date_diff
    from (
        select
            t.*,
            row_number() over(order by date) rn1,
            row_number() over(partition by stage, name order by date) rn2
        from mytable t
    ) t
    group by stage, name, rn1 - rn2
) t 
where date_to is not null
order by date_from
GMB
  • 216,147
  • 25
  • 84
  • 135
1

This is a variation of the gaps-and-islands problem, but it has a pretty simple solution.

Just keep every row where the previous row has a different stage or name. Then use lead() to get the next date. Here is the basic idea:

select t.stage, t.name, t.date as datefrom
       lead(t.date) over (order by t.date) as dateto,
       datediff(day, t.date, lead(t.date) over (order by t.date)) as diff
from (select t.*,
             lag(date) over (partition by stage, name order by date) as prev_sn_date,
             lag(date) over (order by date) as prev_date
      from t
     ) t
where prev_sn_date <> prev_date or prev_sn_date is null;

If you really want to filter out the last row, you need one more step; I'm not sure if that is desirable.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786