The environment is SQL Server 2014.
I am dealing with reducing many insurance enrollment details (small ranges of first and last) into much larger mutually exclusive (ME) ranges of continuous enrollment.
For clarity the problem is reduced to sample data sorted by id, first, last. F(n) and L(n) are the first and last values in record n within id.
Most detail ranges are typical
- adjacent, F(n) = L(n-1) + 1
But there is devil in the details -- welcome to real world data.
- connected non-adjacent, F(n) <= L(n-1)
- embedded, L(n) <= L(n-1)
- overlap, L(n) > L(n-1)
- disconnected non-adjacent
- gap defines boundaries of consolidated ranges that are mutually exclusive
- ME(i).last = max of prior L
This picture demonstrates most cases
Have
1 30 60 90 120
+-------+--------+--------+--------+
1 +-------+ (1:30)
2 +-------+ (31:60) adjacent
3 +--+ (40:50) embedded
4 + (61:61) adjacent some earlier
5 +-+ (61:65) adjacent some earlier
6 +--+ (61:75) adjacent some earlier
7 +--+ (65:80) overlap
8 +---------+ (85:120) gap, boundaries of ME ranges located
9 +-------+ (91:120)
10 +--+ (110:120)
Want
1 30 60 90 120
+-------+--------+--------+--------+
1 +----------------------+ (1:80)
2 +---------+ (85:120)
There are other unusual cases, such as embed followed by gap
.....
..
....
AAAAA BBBB
DROP TABLE #Details
CREATE TABLE #Details (id int, first int, last int);
insert into #Details values (1, 1, 30);
insert into #Details values (1, 31, 60);
insert into #Details values (1, 40, 50);
insert into #Details values (1, 61, 75);
insert into #Details values (1, 65, 80);
insert into #Details values (1, 85, 120);
insert into #Details values (1, 91, 120);
insert into #Details values (1, 110, 120);
I read some answers on stack and Refactoring Ranges but couldn't make the leap to my data arrangement.
--For jpw--
Typical analysis may involve 20,000 ids with 200 detail records. These cases have been handled by downloading to a local machine and processed (in a cursor like manner) in a SAS Data step. Worst case is order of 650K ids and 150M detail -- too much data for downloading way, and leads to other resources issues. I believe all details might be in range of 1.2B rows. Regardless, if it can all be done in SQL server the whole process is simplified.