6

I have been working on the below but getting no results and the deadline is fast approaching. Also, there are over a million rows as the below. Appreciate your help on the below.

Objective: Group results by MEMBER and build Continuous Coverage Ranges for each Member by combining individual Date Ranges which either overlap or run consecutive to each other with no breaks between the Start & End day of the range.

I have data in the below format:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30

...

In the above the member (00001) is a valid member as there is a continuous date range from 2010-01-15 to 2010-01-30 (with no gaps). Please note that the Claim ID 012355 for this member starts immediately next to the End Date of Claim ID 012352. This is still valid as it forms a continuous range.

However, the member (00002) should be an Invalid member as there is a gap of 5 days between Enddate of Claim ID 012357 and Start Day for Claim ID 012359

What I am trying to do is get a list of ONLY those members who have claims for every single day of the continuous date range (for each member) with no gaps between the MIN(Start-date) and Max(End Date) for each Distinct member. Members who have gaps are discarded.

Thanks in advance.

UPDATE:

I have reached until here. Note: FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 

The above code seems to have error as I am getting only one row and that too it is incorrect. My desired output is only 1 column as below:

Valid_Member_Code

00001

00007

00009

... etc.,

Vijay
  • 83
  • 1
  • 7
  • 2
    Have a look here: http://www.simple-talk.com/sql/t-sql-programming/find-missing-date-ranges-in-sql/ – Chris Gessler Aug 29 '12 at 12:29
  • possible duplicate of [islands and gaps tsql](http://stackoverflow.com/questions/11983853/islands-and-gaps-tsql) – gbn Aug 29 '12 at 12:32
  • similar topic discussed here: http://stackoverflow.com/questions/12088959/merging-unused-timeslots/12089450#12089450 – paul Aug 29 '12 at 12:37
  • @Chris Gessler - Sorry if the following question is naive. However, my requirement was for each distinct member. How do I use the steps in the link to find the date ranges for each distinct member? Thanks. – Vijay Aug 29 '12 at 12:39
  • @Vijay - if you want to find invalid MemberCodes, you can self join and look for an t1.EndDate > t2.StartDate. To find all the date gaps, you'll have to partition your results. Are you using SQL Server by chance? – Chris Gessler Aug 29 '12 at 12:51
  • @Chris - I am using SQL server 2008 R2. – Vijay Aug 29 '12 at 12:56
  • @Vijay - I smell a CTE + ROW_NUMBER OVER PARTITION solution coming... I can't do the work myself right now, but if I have time later and nobody has come up with a solution, I'll see what I can do. – Chris Gessler Aug 29 '12 at 13:00
  • Thanks Chris. I have reached until the below (updating the original post). – Vijay Aug 29 '12 at 13:03
  • Well, the code I had posted above is wrong, I suppose. Not getting the desired output. – Vijay Aug 29 '12 at 13:25

3 Answers3

5

Try this: http://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

See query progression here: http://www.sqlfiddle.com/#!3/c3365/20


How it works, compare the current end date to its next start date and check the date gap:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |

Then check if a member has same count of claims with no gaps to its total claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;

Output:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |

Finally, filter them, members with no gaps in their claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

Output:

| MEMBERCODE |
--------------
|          1 |

Do note that you don't need to do COUNT(*) > 1 to detect members with 2 or more claims. Instead of using LEFT JOIN, we uses JOIN, this will automatically discard members who have yet to have a second claim. Here's the version(longer) if you opt to use LEFT JOIN instead(same output as above):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only

Here's how see data of above query prior to filtering:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |

EDIT on requirement clarification:

On your clarification, you wanted to include members who have yet to have second claim too, do this instead: http://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 

Output:

| MEMBERCODE |
--------------
|          1 |
|          3 |

The technique is to count the member's nextstartdate, if they have no next start date date(i.e. count(nextstartdate) = 0) then they are single claims only and valid too, then just attach this OR condition:

or count(nextstartdate) = 0; 

Actually, the condition below will suffice too, I wanted to make the query more self-documenting though, hence I recommend counting on member's nextstartdate. Here's an alternative condition for counting members who have yet to have a second claim:

or count(*) = 1;

Btw, we also have to change the comparison from this:

sum(case when gap <= 1 then 1 end) = count(*)

to this(as we are using LEFT JOIN now):

sum(case when gap <= 1 then 1 end) = count(gap)
Michael Buen
  • 38,643
  • 9
  • 94
  • 118
  • Thanks a lot Michael! The updated query returned 37052 valid members from a total of 1.67 million claims. The query progression and your explanation of the steps were very educative :-) – Vijay Aug 30 '12 at 07:12
  • Hi Michael, since I also need members who have yet to have a second claim (in other words, members with single claims are valid in my case), I used LEFT JOIN instead of JOIN based on your note at the end, above. However, I am getting 0 results. Wondering if I missed something here? – Vijay Aug 30 '12 at 15:56
  • @Vijay Answer amended based on your clarified requirement ツ – Michael Buen Aug 30 '12 at 23:32
  • Thanks Michael _:-)_ I was trying to figure out why I was getting 0. The modified code did give me the desired output when I randomly checked the members. Also, thanks for the excellent detailed explanation and the query progression! – Vijay Aug 31 '12 at 06:55
1

Try this, it partitions rows by MemberCode and gives them ordinal numbers. Then it compares rows with subsequent num value, if difference between end date of a row and start date of a next row is greater than one day, it's an invalid member:

DECLARE @t TABLE (MemberCode  VARCHAR(100), ClaimID   
    INT,StartDate   DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001'   ,       012345   ,        '2010-01-15'   ,       '2010-01-20')
,('00001'   ,       012350   ,       '2010-01-19'   ,       '2010-01-22')
,('00001'   ,       012352   ,       '2010-01-20'   ,       '2010-01-25')
,('00001'   ,       012355   ,       '2010-01-26'   ,       '2010-01-30')
,('00002'   ,       012357   ,       '2010-01-20'   ,       '2010-01-25')
,('00002'   ,       012359   ,       '2010-01-30'   ,       '2010-02-05')
,('00002'   ,       012360   ,       '2010-02-04'   ,       '2010-02-15')
,('00003'   ,       012365   ,       '2010-02-15'   ,       '2010-02-28')
,('00004'   ,       012366   ,       '2010-03-18'   ,       '2010-03-23')
,('00005'   ,       012367   ,       '2010-03-19'   ,       '2010-03-25')
,('00006'   ,       012368   ,       '2010-03-20'   ,       '2010-03-21')

;WITH tbl AS (

    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate) 
                AS num
    FROM    @t
), invalid AS (

    SELECT  tbl.MemberCode
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1
    AND     tbl.MemberCode = _tbl.MemberCode
    WHERE   DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1  
)

SELECT  MemberCode
FROM    tbl
EXCEPT
SELECT  MemberCode
FROM    invalid
Ivan Golović
  • 8,732
  • 3
  • 25
  • 31
  • Thanks Ivan, I have executed the query on the same table that I executed Michael's code above. However, the results varied drastically. While your code returns 328,256 claims (for 112,077 DISTINCT members), Michael's code returns 37,052 valid members. – Vijay Aug 30 '12 at 07:13
  • Thanks for trying it out, that's an interesting result. If you have any example data where query failed to exclude invalid members please post it, I'm curious to see where it fails. – Ivan Golović Aug 30 '12 at 07:31
  • @Vijay I believe I found one difference between my and Michael's query, my query doesn't exclude certain rows because I believe they are valid, those are rows that have only single occurrence for a given `MemberCode`, I added couple of them to the sample input data; 0004, 0005, 0006. Do you consider such rows as valid or invalid? – Ivan Golović Aug 30 '12 at 07:46
  • yes, a member code with only a single claim is a valid member. – Vijay Aug 30 '12 at 10:34
  • In such case this query will return these members, that may be the reason why it returns more members. – Ivan Golović Aug 30 '12 at 11:27
  • Hi Ivan, thanks for the reply. I think (just wondering about this one though) one more reason for the difference might be that Michael has used "b.rn = a.rn **+ 1**" and you have used "_tbl.num **- 1**". I think since both of your queries Partition by member code and Order by StartDate, one code using **+1** and the other code using **-1** would surely give differential results. Right? – Vijay Sep 03 '12 at 09:39
  • Basically, they both perform self-join, I wrote it like `tbl.num = _tbl.num - 1`. It would be the same if I had written `_tbl.num = tbl.num + 1` which is equivalent to `b.rn = a.rn + 1`. – Ivan Golović Sep 03 '12 at 12:38
0

I think your query gives back false negatives because it only checks the time interval between consecutive rows. In my opinion, it is possible that the gap is compensated by one of the previous lines. Let me give an example:

Row l: 2010-01-01 | 2010-01-31
Row 2: 2010-01-10 | 2010-01-15
Row 3: 2010-01-20 | 2010-01-25

Your code will report a gap between row 2 and row 3, while it is being filled by row 1. Your code will not detect this. You should use the MAX(EndDate) of all previous rows in the DATEDIFF function.

DECLARE @t TABLE (PersonID  VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001'   ,       '2010-01-01'   ,       '2010-01-17')
INSERT @t VALUES('00001'   ,       '2010-01-19'   ,       '2010-01-22')
INSERT @t VALUES('00001'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00001'   ,       '2010-01-26'   ,       '2010-01-31')
INSERT @t VALUES('00002'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-05')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-15')
INSERT @t VALUES('00003'   ,       '2010-02-15'   ,       '2010-02-28')
INSERT @t VALUES('00004'   ,       '2010-03-18'   ,       '2010-03-23')
INSERT @t VALUES('00005'   ,       '2010-03-19'   ,       '2010-03-25')
INSERT @t VALUES('00006'   ,       '2010-01-01'   ,       '2010-04-20')
INSERT @t VALUES('00006'   ,       '2010-01-20'   ,       '2010-01-21')
INSERT @t VALUES('00006'   ,       '2010-01-25'   ,       '2010-01-26')

;WITH tbl AS (
    SELECT  
        *, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
    FROM    @t
), invalid AS (
    SELECT  tbl.PersonID 
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
    WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1  
)

SELECT  PersonID
FROM    tbl
EXCEPT
SELECT  PersonID
FROM    invalid
TomP
  • 1