Aggregating overlapping data

Question

I have a table called dbo.StayData. In this table there are 4 columns the GuestID, StayID, CheckIn, CheckOut. I am trying to write a report that will return 3 columns GuestID, CheckIN, CheckOut. However if a guest has stays that overlap such as CheckOuton one line being the same as CheckIn on a different line or if the CheckIn on one line falls between the CheckIn and CheckOut of a different entry I want to combine those into one entry so I can get a better understanding of when a full trip occurred. below I have a sample of the data on the left and a sample of the desired results on the right. I am open to all ideas. I have thought about trying to make temp tables but I am unsure how to get to where I want to be.

I think this question should be able to answer my question but I don't really understand the code. SQL - Consolidate Overlapping Data

I used the code from the answer in the above question and was able to make it work fine when looking at 1 guest in a temp table. However I am trying to modify that code to look at the whole dbo.StayData table which has 100k's of rows and 10k's of GuestID's. below is the what I came up with after modification.

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select GuestID,CheckIn as limit, 's' as pos from dbo.StayData
    union
    select GuestID,CheckOut as limit, 'e' as pos from dbo.StayData
)
, ordered_positions as --Rank all limits
(
    select GuestID,limit, pos, RANK() OVER (ORDER BY GuestID,limit) AS Rank
    from temp_positions
)
, collapsed_positions as --Collapse ranges (select the first limit, if s is preceded or followed by e, and the last limit) and rank limits again
(
    select op1.*, RANK() OVER (ORDER BY Op1.GuestID,op1.Rank) AS New_Rank
    from ordered_positions op1
    inner join ordered_positions op2
    on (op1.Rank = op2.Rank and op1.Rank = 1 and op1.pos = 's')
    or (op2.Rank = op1.Rank-1 and op2.pos = 'e' and op1.pos = 's') 
    or (op2.Rank = op1.Rank+1 and op2.pos = 's' and op1.pos = 'e')
    or (op2.Rank = op1.Rank and op1.pos = 'e' and op1.Rank = (select max(Rank) from ordered_positions))
)
, final_positions as --Now each s is followed by e. So, select s limits and corresponding e limits. Rank ranges
(
    select Cp1.GuestID,cp1.limit as cp1_limit, cp2.limit as cp2_limit,  RANK() OVER (ORDER BY cp1.GuestID, cp1.limit) AS Final_Rank
    from collapsed_positions cp1
    inner join collapsed_positions cp2
    on cp1.pos = 's' and cp2.New_Rank = cp1.New_Rank+1
)
--Finally, subtract 1 from Rank to start Range #'s from 0
select fp.GuestID, fp.Final_Rank-1 seq_no, fp.cp1_limit as starty, fp.cp2_limit as endy
from final_positions fp;

score 0 · Accepted Answer · answered Jun 21 '17 at 19:52

So below is the final code I am now using which appears to work. I moved the Subquery up into a join on the collapsed Positions portion, I also added distinct to the final select.

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select GuestID,CheckIn as limit, 's' as pos from dbo.StayData
    union
    select GuestID,CheckOut as limit, 'e' as pos from dbo.StayData
)
, ordered_positions as --Rank all limits
(
    select GuestID,limit, pos, RANK() OVER (ORDER BY GuestID,limit) AS Rank
    from temp_positions
)
, collapsed_positions as --Collapse ranges (select the first limit, if s is preceded or followed by e, and the last limit) and rank limits again
(
    select op1.*, RANK() OVER (ORDER BY Op1.GuestID,op1.Rank) AS New_Rank
    from ordered_positions op1
    Left Join (Select GuestID, Max(Rank) as Rank from Ordered_Positions Group by GuestID) as OP
    on OP1.Rank = OP.Rank

    inner join ordered_positions op2
    on (op1.Rank = op2.Rank and op1.Rank = 1 and op1.pos = 's')
    or (op2.Rank = op1.Rank-1 and op2.pos = 'e' and op1.pos = 's') 
    or (op2.Rank = op1.Rank+1 and op2.pos = 's' and op1.pos = 'e')
    or (op2.Rank = op1.Rank and op1.pos = 'e' and op1.Rank = OP.Rank)
)
, final_positions as --Now each s is followed by e. So, select s limits and corresponding e limits. Rank ranges
(
    select Cp1.GuestID,cp1.limit as cp1_limit, cp2.limit as cp2_limit,  RANK() OVER (ORDER BY cp1.GuestID, cp1.limit) AS Final_Rank
    from collapsed_positions cp1
    inner join collapsed_positions cp2
    on cp1.pos = 's' and cp2.New_Rank = cp1.New_Rank+1
)
--Finally, subtract 1 from Rank to start Range #'s from 0
select distinct fp.GuestID, fp.Final_Rank-1 seq_no, fp.cp1_limit as starty, fp.cp2_limit as endy
from final_positions fp;

Aggregating overlapping data

1 Answers1