Maximum number of subsets of overlapping intervals

Question

Given a set of intervals S what is the efficient way to find the number of subsets assigned to each interval from the set S.

say for example

S = (11,75), (14,62), (17,32), (24,48), (31,71), (34,74), (40,97), (41,58)

as for output

(11, 75) => 6 -> (14,62), (17,32), (24,48), (31,71), (34,74), (41,58)  
(14, 62) => 3 -> (17,32), (24,48), (41,58)   
(17, 32) => 0  
(24, 48) => 0  
(31, 71) => 1 -> (41,58)   
(34, 74) => 1 -> (41,58)  
(40, 97) => 1 -> (41,58)  
(41, 58) => 0

Is it possible to get this mapping in o(nlogn) or substantially less than o(n²)?

Is it guaranteed that the inputs are sorted by start-of-interval? — Dave, Mar 17 '21 at 20:05
Do we know anything about the size of the range of values compared to the number of intervals? I.e. max_val - min_val vs num_ranges? — Dave, Mar 17 '21 at 20:07
@Dave it is not guaranteed that inputs are sorted. I was playing with cases and so used this as for example. Also, nothing is known about the range of values with respect to the number of intervals. — High On Math, Mar 17 '21 at 20:22

score 3 · Accepted Answer · answered Mar 17 '21 at 19:52

There seems to be an O(n*log(n)) way to do this. The intuition is that we need some sort of way to organize the intervals in a way where, at the current step, all intervals that the current one could contain have been considered.

The algorithm is as follows:

Sort the intervals by end times in ascending order, and sort tied end times by their start times in descending order.
Iterate over the intervals and maintain a sorted set of all start times seen. The idea is that, when looking at the current interval, all intervals that it could contain have already been examined, and the number of intervals that the current one does contain is simply the number of elements in our built set that have a start time later than the current one.

Walking through the example, we first find that

sortedIntervals = [(17,32), (24,48), (41,58), (14,62), (31,71), (34,74), (11,75),(40,97)]

And let our sorted set of intervals (sorting now by start time) be

examinedIntervals = []

Now let's step through sortedIntervals

Consider (17,32). examinedIntervals is empty, so it doesn't contain anything.
Consider (24, 48). examinedIntervals = [(17, 32)] . Because there are no intervals that start after 24, we have that (24, 48) contains 0 intervals.
Consider (41, 58). examinedIntervals = [(17, 32), (24, 48)]. No intervals have a start time after 41, so (41, 58) contains 0 intervals
Consider (14, 62). examinedIntervals = [(17, 32), (24, 48), (41, 58)]. Now all three intervals have a start time after 14, so (14, 62) contains 3 intervals
Consider (31, 71). examinedIntervals = [(14, 62), (17, 32), (24, 48), (41, 58)]. Only one interval comes after 31, so (31, 71) contains 1 interval
Consider (34, 74). examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (41, 58)]. One interval comes after 34, so (34, 74) contains 1 interval
Consider (11, 75). examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)], and all 6 intervals have a start time after 14.
Consider (40, 97). examinedIntervals = [(11, 75), (14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)]. Only one interval comes after 40, so (40, 97) contains 1 interval.

Summarizing we do indeed get the correct results:

(40, 97) -> 1
(11, 75) -> 6
(34, 74) -> 1
(31, 71) -> 1
(14, 62) -> 3
(41, 58) -> 0
(24, 48) -> 0
(17, 32) -> 0

It can also be verified easily that runtime is O(n*log(n)), assuming the use of an efficient sort and a balanced tree in the second part. The initial sort runs in the given amount of time. The second portion involves n insertions into a binary tree of height O(log(n)), giving a runtime of O(nlog(n)). Because we're summing two steps that run in O(nlog(n)), the overall runtime is O(nlog(n)).

Maximum number of subsets of overlapping intervals

1 Answers1