7

Q: How can I rank records based on 1 column's changing value?

I have the following data (https://pastebin.com/vdTb1JRT):

EmployeeID  Date        Onleave
ABH12345    2016-01-01  0
ABH12345    2016-01-02  0
ABH12345    2016-01-03  0
ABH12345    2016-01-04  0
ABH12345    2016-01-05  0
ABH12345    2016-01-06  0
ABH12345    2016-01-07  0
ABH12345    2016-01-08  0
ABH12345    2016-01-09  0
ABH12345    2016-01-10  1
ABH12345    2016-01-11  1
ABH12345    2016-01-12  1
ABH12345    2016-01-13  1
ABH12345    2016-01-14  0
ABH12345    2016-01-15  0
ABH12345    2016-01-16  0
ABH12345    2016-01-17  0

I would like to produce the following results:

 EmployeeID DateValidFrom    DateValidTo     OnLeave
 ABH12345   2016-01-01       2016-01-09      0
 ABH12345   2016-01-10       2016-01-13      1
 ABH12345   2016-01-14       2016-01-17      0

So I'm thinking if I can somehow create a ranked column (like shown below) that increments based on the value in the Onleave column - partitioned by the EmployeeID column.

EmployeeID  Date        Onleave    RankedCol
ABH12345    2016-01-01  0          1
ABH12345    2016-01-02  0          1
ABH12345    2016-01-03  0          1
ABH12345    2016-01-04  0          1
ABH12345    2016-01-05  0          1
ABH12345    2016-01-06  0          1
ABH12345    2016-01-07  0          1
ABH12345    2016-01-08  0          1
ABH12345    2016-01-09  0          1
ABH12345    2016-01-10  1          2
ABH12345    2016-01-11  1          2
ABH12345    2016-01-12  1          2
ABH12345    2016-01-13  1          2
ABH12345    2016-01-14  0          3
ABH12345    2016-01-15  0          3
ABH12345    2016-01-16  0          3
ABH12345    2016-01-17  0          3

Then I would be able to do the following:

SELECT
 [EmployeeID]    = [EmployeeID]
,[DateValidFrom] = MIN([Date])
,[DateValidTo]   = MAX([Date])
,[OnLeave]       = [OnLeave]
FROM table/view/cte/sub-query
GROUP BY 
 [EmployeeID]
,[OnLeave]
,[RankedCol]

Other solutions are very welcome..

Below is the test data :

WITH CTE AS ( SELECT EmployeeID = 'ABH12345', [Date] = CAST(N'2016-01-01' AS Date), [Onleave] = 0
UNION SELECT 'ABH12345', CAST(N'2016-01-02' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-03' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-04' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-05' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-06' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-07' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-08' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-09' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-10' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-11' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-12' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-13' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-14' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-15' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-16' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-17' AS Date), 0
)

SELECT * FROM CTE
Emil Vissing
  • 460
  • 4
  • 19

3 Answers3

3

One more way to do it with lag. Assign groups by getting the previous Onleave value for each employeeid and resetting it when a different value is found.

select employeeid,min(date) as date_from,max(date) as date_to,max(onleave) as onleave
from (select t.*,sum(case when prev_ol=onleave then 0 else 1 end) over(partition by employeeid order by date) as grp
      from (select c.*,lag(onleave,1,onleave) over(partition by employeeid order by date) as prev_ol
            from cte c
           ) t
      ) t
group by employeeid,grp 
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
2

This is an example of a group-and-islands problem. In this case, you can use date arithmetic. The key observation is that subtracting an sequence of integers from the date column identifies the islands of similar values.

As a query, this looks like:

SELECT EmployeeId, MIN([Date]) as DateValidFrom, MAX([Date]) as DateValidTo,
       OnLeave
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY EmployeeId, OnLeave ORDER BY [Date]) as seqnum
      FROM t
     ) t
GROUP BY EmployeeID, DATEADD(day, - seqnum, [Date]), OnLeave;

You can run the subquery, stare at the results, and do the arithmetic to see why this works.

Here is the example.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Interesting.. The output is somewhat still the same from where I started. How would I be able to sum up the result in just 3 rows though? – Emil Vissing Jun 03 '17 at 14:15
2

Here is another, a bit simpler, way to get the desired output - accessing the table only once.

-- sample of data from your question
with t1(EmployeeID, Date1, Onleave) as(
  select 'ABH12345', cast('2016-01-01' as date),  0 union all
  select 'ABH12345', cast('2016-01-02' as date),  0 union all
  select 'ABH12345', cast('2016-01-03' as date),  0 union all
  select 'ABH12345', cast('2016-01-04' as date),  0 union all
  select 'ABH12345', cast('2016-01-05' as date),  0 union all
  select 'ABH12345', cast('2016-01-06' as date),  0 union all
  select 'ABH12345', cast('2016-01-07' as date),  0 union all
  select 'ABH12345', cast('2016-01-08' as date),  0 union all
  select 'ABH12345', cast('2016-01-09' as date),  0 union all
  select 'ABH12345', cast('2016-01-10' as date),  1 union all
  select 'ABH12345', cast('2016-01-11' as date),  1 union all
  select 'ABH12345', cast('2016-01-12' as date),  1 union all
  select 'ABH12345', cast('2016-01-13' as date),  1 union all
  select 'ABH12345', cast('2016-01-14' as date),  0 union all
  select 'ABH12345', cast('2016-01-15' as date),  0 union all
  select 'ABH12345', cast('2016-01-16' as date),  0 union all
  select 'ABH12345', cast('2016-01-17' as date),  0
)
-- actual query
select max(w.employeeid) as employeeid
     , min(w.date1)      as datevalidfrom
     , max(w.date1)      as datevalidto
     , max(w.onleave)    as onleave 
  from (
        select row_number() over(partition by employeeid order by date1) -
               row_number() over(partition by employeeid, onleave order by date1) as grp
             , employeeid
             , date1
             , onleave
          from t1 s
        ) w
group by w.grp
order by employeeid, datevalidfrom

Result:

employeeid datevalidfrom datevalidto onleave
---------- ------------- ----------- -----------
ABH12345   2016-01-01    2016-01-09  0
ABH12345   2016-01-10    2016-01-13  1
ABH12345   2016-01-14    2016-01-17  0
Nick Krasnov
  • 26,886
  • 6
  • 61
  • 78