12

Problem Statement

Input set of n intervals; {[s_1,t_1], [s_2,t_2], ... ,[s_n,t_n]}.

Output pair of intervals; {[s_i,t_i],[s_j,t_j]}, with the maximum overlap among all the interval pairs.

Example

input intervals : {[1, 10], [2, 6], [3,15], [5, 9]}

-> There are possible 6 interval pairs. Among those pairs, [1,10] & [3,15] has the largest possible overlap of 7.

output : {[1,10],[3,15]}

A naive algorithm will be a brute force method where all n intervals get compared to each other, while the current maximum overlap value being tracked. The time complexity would be O(n^2) for this case.

I was able to find many procedures regarding interval trees, maximum number of overlapping intervals and maximum set of non-overlapping intervals, but nothing on this problem. Maybe I would be able to use the ideas given in the above algorithms, but I wasn't able to come up with one.

I spent many hours trying to figure out a nice solution, but I think I need some help at this point.

Any suggestions will help!

Saeid
  • 4,147
  • 7
  • 27
  • 43
user1751434
  • 135
  • 1
  • 1
  • 5

1 Answers1

15

First, sort the intervals: first by left endpoint in increasing order, then — as a secondary criterion — by right endpoint in decreasing order. For the rest of this answer, I'll assume that the intervals are already in sorted order.

Now, there are two possibilities for what the maximum possible overlap might be:

  • it may be between an interval and a later interval that it completely covers.
  • it may be between an interval and the very next interval that it doesn't completely cover.

We can cover both cases in O(n) time by iterating over the intervals, keeping track of the following:

  • the greatest overlap we've seen so far, and the relevant pair of intervals.
  • the latest interval we've seen, call it L, that wasn't completely covered by any of its predecessors. (For this, the key insight is that thanks to the ordering of the intervals, we can easily tell if an interval is completely covered by any of its predecessors — and therefore if we need to update L — by simply checking if it's completely covered by the current L. So we can keep L up-to-date in O(1) time.)

and computing each interval's overlap with L.

So:

result := []
max_overlap := 0
L := sorted_intervals[1]
for interval I in sorted_intervals[2..n]:
    overlap := MIN(L.right, I.right) - I.left
    if overlap >= max_overlap:
        result := [L, I]
        max_overlap := overlap
    if I.right > L.right:
        L := I

So the total cost is the cost of sorting the intervals, which is likely to be O(n log n) time but may be O(n) if you can use bucket-sort or radix-sort or similar.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • You are a genius. – user1751434 Sep 04 '16 at 05:09
  • 1
    Doesn't works for intervals (1,6),(3,6),(5,8). As per your logic, we will ignore (3,6) since it is covered by its predecessor (1,6). We are left with (1,6),(5,8) , overlap between them =1. This is wrong since max overlap is between (1,6),(3,6) = 3. – user3886907 Aug 20 '17 at 08:49
  • @user3886907: Whoops, you are quite right, thanks! Will fix . . . – ruakh Aug 20 '17 at 18:14
  • I believe this is still not fully correct. Consider (1,6),(2,5),(5,8). This algorithm returns (1,6),(2,5), overlap between them =4. But the right answer is (1,6),(2,5) = 3. – Jonathan Oct 14 '18 at 07:05
  • is this algorithm possible in lesser than linear time? – dixitk13 May 26 '19 at 06:20
  • @dixitk13: Definitely not, because any algorithm for this problem will need to examine all of the intervals. – ruakh May 26 '19 at 09:33
  • how about if you use [augmented interval trees](https://en.wikipedia.org/wiki/Interval_tree#Augmented_tree) the membership check becomes K + log N, where K is the number of overlapping intervals with the query interval, and then finding the max should be an order of log N – dixitk13 May 27 '19 at 16:12
  • @dixitk13: I think you're not taking into account the time to *build* the augmented interval tree? – ruakh May 27 '19 at 16:21
  • building the tree itself will cost another O(n log n) which is similar to sorting it as you mentioned, but should be able to reap the benefits when the query happens. I was just thinking out loud - not trying to devalue anything here. – dixitk13 May 27 '19 at 16:26