Get records only if consecutive days per user is 30 or greater

Question

I have the following data being returned from a query. Essentially I am putting this in a temp table so it is now in a temp table that I can query off of(Obviously a lot more data in real life, I am just showing an example):

EmpId      Date
1        2011-01-01
1        2011-01-02
1        2011-01-03
2        2011-02-03
3        2011-03-01
4        2011-03-02
5        2011-01-02

I need to return only EmpId's that have 30 or more consecutive days in the date column. I also need to return the day count for these employees that have 30 or more consecutive days. There could potentially be 2 or more sets of different consecutive days that are 30 or more days. iIn this instance I would like to return multiple rows. So if an employee has a date from 2011-01-01 to 2011-02-20 then return this and the count in one row. Then if this same employee has dates of 2011-05-01 to 2011-07-01 then return this in another row. Essentially all breaks in consecutive days are treated as a seperate record.

Check out this question: http://stackoverflow.com/questions/2725807/how-to-find-n-consecutive-records-in-a-table-using-sql — alf, Aug 11 '11 at 14:48

score 4 · Accepted Answer · edited Sep 24 '13 at 17:36

4

Using DENSE_RANK should do the trick:

;WITH sampledata
    AS (SELECT 1 AS id, DATEADD(day, -0, GETDATE())AS somedate
        UNION ALL SELECT 1, DATEADD(day, -1, GETDATE())
        UNION ALL SELECT 1, DATEADD(day, -2, GETDATE())
        UNION ALL SELECT 1, DATEADD(day, -3, GETDATE())
        UNION ALL SELECT 1, DATEADD(day, -4, GETDATE())
        UNION ALL SELECT 1, DATEADD(day, -5, GETDATE())
        UNION ALL SELECT 1, DATEADD(day, -10, GETDATE())
        UNION ALL SELECT 1, '2011-01-01 00:00:00'
        UNION ALL SELECT 1, '2010-12-31 00:00:00'
        UNION ALL SELECT 1, '2011-02-01 00:00:00'
        UNION ALL SELECT 1, DATEADD(day, -10, GETDATE())
        UNION ALL SELECT 2, DATEADD(day, 0, GETDATE())
        UNION ALL SELECT 2, DATEADD(day, -1, GETDATE())
        UNION ALL SELECT 2, DATEADD(day, -2, GETDATE())
        UNION ALL SELECT 2, DATEADD(day, -6, GETDATE())
        UNION ALL SELECT 3, DATEADD(day, 0, GETDATE())
        UNION ALL SELECT 4, DATEADD(day, 0, GETDATE())
        UNION ALL SELECT 5, DATEADD(day, 0, GETDATE()))
   , ranking
    AS (SELECT *, DENSE_RANK()OVER(PARTITION BY id ORDER BY DATEDIFF(day, 0, somedate)) - DATEDIFF(day, 0, somedate)AS dategroup
          FROM sampledata)
    SELECT id
         , MIN(somedate)AS range_start
         , MAX(somedate)AS range_end
         , DATEDIFF(day, MIN(somedate), MAX(somedate)) + 1 AS consecutive_days
      FROM ranking
     GROUP BY id, dategroup
     --HAVING DATEDIFF(day, MIN(somedate), MAX(somedate)) + 1 >= 30 --change as needed
     ORDER BY id, range_start

edited Sep 24 '13 at 17:36

JustinStolle

4,182
3
37
48

answered Aug 11 '11 at 14:56

Dibstar

2,334
2
24
38

Works for ranges within the same month, but breaks on month transitions. – Andriy M Aug 11 '11 at 15:14
@Andriy M - changed it so that it should work across years and months as well – Dibstar Aug 11 '11 at 15:44
Yup, turns out we've been thinking alike. :) – Andriy M Aug 11 '11 at 15:51
ROW_NUMBER() is probably faster than DENS_RANK(), but with DENSE_RANK() it should be possible to account for duplicate dates. (Only I would then change `order by date` to `order by datediff(day, 0, date)` in the OVER clause.) – Andriy M Aug 11 '11 at 15:53
@Andriy M - Agreed, thought that it was more useful to account for dupes. Have amended the OVER clause to make it a bit tighter – Dibstar Aug 11 '11 at 15:55
For the clause `HAVING DATEDIFF(DD,MIN(date),MAX(date)) > 29` (commented out), it should be `HAVING DATEDIFF(DD,MIN(date),MAX(date)) >= 29` or `HAVING DATEDIFF(DD,MIN(date),MAX(date)) + 1 >= 30` otherwise you'll miss the records with exactly 30 consecutive days. – JustinStolle Sep 23 '13 at 23:22

score 1 · Answer 2 · answered Aug 11 '11 at 14:43

Something like this should do the trick, haven't tested it though.

SELECT 
  a.empid
  , count(*) as consecutive_count
  , min(a.mydate) as startdate
FROM (SELECT * FROM logins ORDER BY mydate) a
INNER JOIN (SELECT * FROM logins ORDER BY mydate) b 
  ON (a.empid = b.empid AND datediff(day,a.mydate,b.mydate) = 1
GROUP BY a.empid, startdate
HAVING consecutive_count > 30

score 0 · Answer 3 · answered Aug 11 '11 at 15:44

This is a good case for a recursive CTE. I stole the data table from @Davin:

with data AS --sample data 
( SELECT 1 as id ,DATEADD(DD,-0,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-1,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-2,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-3,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-4,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-5,GETDATE()) as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-10,GETDATE()) as date UNION ALL 
SELECT 1 as id ,'2011-01-01 00:00:00.000' as date UNION ALL 
SELECT 1 as id ,'2010-12-31 00:00:00.000' as date UNION ALL 
SELECT 1 as id ,'2011-02-01 00:00:00.000' as date UNION ALL 
SELECT 1 as id ,DATEADD(DD,-10,GETDATE()) as date UNION ALL 
SELECT 2 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL 
SELECT 2 as id ,DATEADD(DD,-1,GETDATE()) as date UNION ALL 
SELECT 2 as id ,DATEADD(DD,-2,GETDATE()) as date UNION ALL 
SELECT 2 as id ,DATEADD(DD,-6,GETDATE()) as date UNION ALL 
SELECT 3 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL 
SELECT 4 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL 
SELECT 5 as id ,DATEADD(DD,0,GETDATE()) as date   ) 
,CTE AS
(
    SELECT id, CAST(date as date) Date, Consec = 1
    FROM data
    UNION ALL
    SELECT t.id, CAST(t.date as DATE) Date, Consec = (c.Consec + 1)
    FROM data T
    INNER JOIN CTE c
        ON T.id = c.id
        AND CAST(t.date as date) = CAST(DATEADD(day, 1, c.date) as date)

)

SELECT id, MAX(consec)
FROM CTE
GROUP BY id
ORDER BY id

Basically this generates a lot of rows per person, and measures how many days in a row each date represents.

Do you know how to copy and paste code from here into SSMS without it just putting it all on one line in SSMS? — mameesh, Aug 11 '11 at 15:46
I don't think this will return mutliple rows if an id has more than one consecutive range of 30+ days... — Dibstar, Aug 11 '11 at 15:51

Andriy M · Answer 4 · 2017-01-26T14:59:32.733

Assuming there are no duplicate dates for the same employee:

;WITH ranged AS (
  SELECT
    EmpId,
    Date,
    RangeId = DATEDIFF(DAY, 0, Date)
            - ROW_NUMBER() OVER (PARTITION BY EmpId ORDER BY Date)
  FROM atable
)
SELECT
  EmpId,
  StartDate = MIN(Date),
  EndDate   = MAX(Date),
  DayCount  = DATEDIFF(DAY, MIN(Date), MAX(Date)) + 1
FROM ranged
GROUP BY EmpId, RangeId
HAVING DATEDIFF(DAY, MIN(Date), MAX(Date)) + 1 >= 30
ORDER BY EmpId, MIN(Date)

DATEDIFF turns the dates into integers (the difference of days between the 0 date (1900-01-01) and Date). If the dates are consecutive, the integers are consecutive too. Using the data sample in the question as an example, the DATEDIFF results will be:

EmpId  Date        DATEDIFF
-----  ----------  --------
1      2011-01-01  40542
1      2011-01-02  40543
1      2011-01-03  40544
2      2011-02-03  40575
3      2011-03-01  40601
4      2011-03-02  40602
5      2011-01-02  40543

Now, if you take each employee's rows, assign row numbers to them in the order of dates, and get the difference between the numeric representations and row numbers, you will find that the difference stays the same for consecutive numbers (and, therefore, consecutive dates). Using a slightly different sample for better illustration, it will look like this:

Date        DATEDIFF  RowNum  RangeId
----------  --------  ------  -------
2011-01-01  40542     1       40541
2011-01-02  40543     2       40541
2011-01-03  40544     3       40541
2011-01-05  40546     4       40542
2011-01-07  40548     5       40543
2011-01-08  40549     6       40543
2011-01-09  40550     7       40543

The specific value of RangeId is not important, only the fact that it remains the same for consecutive dates matters. Based on that fact, you can use it as a grouping criterion to count the dates in the group and get the range bounds.

The above query uses DATEDIFF(DAY, MIN(Date), MAX(Date)) + 1 to count the days, but you could also simply use COUNT(*) instead.

I know this is a very old post, but can you explain how the RangeId works? I don't understand how you are finding a value that can be grouped on using the Row_Number(). — Maderas, Jan 26 '17 at 13:50

Get records only if consecutive days per user is 30 or greater

4 Answers4