Given two timelines, A and B, where each timeline consist of non-overlapping periods and a score is associated with each period. I want to find all the periods and maximum score for the overlap between the timelines, while keeping the periods that does not have an overlap.
I.e. for the periods that overlap, I want the start, end and maximum score of A and B. For the periods that does not overlap, i still want the start and end, and the associated score.
Periods in each timeline can line up with each other. For cases where the timelines overlap, the score should be consided included in the start and end points.
Visual explanation
An example is shown in the diagram below where with numbers for reference, and the score shown for each interval. A, B are timelines, and R is the resulting timeline i want to calculate.
0 1 2 3 4 5
12345678901234567890123456789012345678901234567890
A = |---------| |--| |--------| |--|
B = |--------|--------| |--------| |--| |--|
R = |---|----|----|---| |--|--|--| |--|--|--| |--||--|
Scores per interval:
A = |---10----| |20| |---30---| |40|
B = |---6----|---12---| |---17---| |33| |50|
R = |6--|10--|12--|12-| |17|20|17| |30|33|30| |40||50|
Example code
DROP TABLE IF EXISTS #TABLE_A
CREATE TABLE #TABLE_A (
timeline varchar(1),
time_start int,
time_end int,
score int
)
DROP TABLE IF EXISTS #TABLE_B
CREATE TABLE #TABLE_B (
timeline varchar(1),
time_start int,
time_end int,
score int
)
DROP TABLE IF EXISTS #TABLE_R
CREATE TABLE #TABLE_R (
time_start int,
time_end int,
score int
)
INSERT INTO #TABLE_A
VALUES
('A', 5, 15, 10)
,('A', 24, 27, 20)
,('A', 32, 41, 30)
,('A', 43, 46, 40)
INSERT INTO #TABLE_B
VALUES
('B', 1, 10, 6)
,('B', 10, 19, 12)
,('B', 21, 30, 17)
,('B', 35, 38, 33)
,('B', 47, 50, 50)
INSERT INTO #TABLE_R
VALUES
(1, 5, 6)
,(5, 10, 10)
,(10, 15, 12)
,(15, 19, 12)
,(21, 24, 17)
,(24, 27, 20)
,(27, 30, 17)
,(32, 35, 30)
,(35, 38, 33)
,(38, 41, 30)
,(43, 46, 40)
,(47, 50, 50)
I have tried to find all the cases where the periods overlap to determine the start and end of the period as well as the score. For scenarios where there is no overlap the period is difficult to select. I have taken the union for all the scenarios, which is quite a lot. I am therefore looking for an elegant solution to split the timelines into periods.
EDIT/Solution
I found the following solution, which does not require creating the entire value range. For this simple example it does not really matter, but for the real world problem i was trying to solve where the times are timestamps it makes a difference. And for large datasets the recursion limit is quicly reached.
The solution creates all the points including intersections, gives the row a number and matches the point by the next in the row. For each pair, the score is then checked to see which is greatest.
WITH points AS (
SELECT time_start AS 'point' FROM #TABLE_A
UNION
SELECT time_end FROM #TABLE_A
UNION
SELECT time_start FROM #TABLE_B
UNION
SELECT time_end FROM #TABLE_B
),
points_row as (
SELECT
point
,ROW_NUMBER() OVER(ORDER BY point) AS row_num
FROM points
),
time_periods AS (
SELECT
s.point AS 'time_start'
,e.point AS 'time_end'
FROM points_row s
LEFT JOIN points_row e on s.row_num + 1 = e.row_num
)
SELECT * FROM (
SELECT
tp.time_start
,tp.time_end
,CASE
WHEN A.score IS NULL AND B.score IS NULL THEN NULL
WHEN A.score IS NULL THEN B.score
WHEN B.score IS NULL THEN A.score
WHEN A.score > B.score THEN A.score else B.score
END AS 'score'
FROM time_periods tp
LEFT JOIN #TABLE_A A ON A.time_start <= tp.time_start AND A.time_end >= tp.time_end
LEFT JOIN #TABLE_B B ON B.time_start <= tp.time_start AND B.time_end >= tp.time_end
) a
WHERE score IS NOT NULL