Max coverage disjoint intervals

Question

Assume you have k<=10^5 intervals [a_i, b_i] \in [1,10^18] (some of them may overlap), and you need to choose a set of intervals mutually disjoint such that their union is maximal. Not maximum number of disjoint intervals, but the union must cover the most.

Can't try all possible subsets 2^k infeasible. Greedy approaches ordering by a_i ( interval covering algorithm) and ordering by b_i ( maximum number of disjoint intervals algorithm ) didn't work Can't figure out if there is a dynamic program solution. Given the size of the input, I think the solution should be O(k log k) or O(k)

Examples 1. [1,4], [3,5], [5,9], [7, 18] Sol [3,5]u[7,18]

[1,2], [2,6], [3,4], [5,7] Sol [1,2]u[3,4]u[5,7]
[2,30], [25,39], [30,40] Sol [2,30]

What are the number of intervals, the values of a_i, b_i at most? — Ivan Gritsenko, Mar 12 '17 at 19:37
Hint: you can sort them by the left border and compute `dp[i]` = max union that ends with in `i`-th segment. It can be done in `O(k log k)` time if you keep track of events properly (or with a data structure like a segment tree). — kraskevich, Mar 12 '17 at 20:53

snakile · Accepted Answer · 2017-03-13T09:16:31.277

3

The problem can be solved in O(k log(k)).

First sort the intervals by their upper bounds (the b_is). Let I(1), I(2), ..., I(k) be the list of sorted intervals. That is,

b_1 <= b_2 <= ... <= b_k

Denote by w(i) the length of interval I(i). That is,

w(i) = b_i - a_i

Denote by f(i) the total length of the optimal solution among those whose last interval is I(i). That is, the solution corresponding to f(i) is a set which:

contains the interval I(i)
doesn't contain any interval whose upper bound is above b_i
has the maximum cover among the sets of (non-overlapping) intervals satisfying 1+2

Now we are going to compute f(1), f(2), ..., f(k) and return the maximum value of them all. Clearly, the optimal solution corresponds to one of the f(i)s and therefore the maximal f(i) is the optimal solution.

To compute each f(i) we use dynamic programming. We do this by relying on the following recurrence relation:

f(i) = w(i) + max{f(j) | b_j < a_i}

I'll demonstrate the computation with your first input example:

I(1)=[1, 4],  w(1)=3
I(2)=[3, 5],  w(2)=2
I(3)=[5, 9],  w(3)=4
I(4)=[7, 18], w(4)=11

We compute f(i) for i=1, 2, 3, 4:

f(1) = w(1) + max{None} = 3 
    f(1) intervals: {I(1)}

f(2) = w(2) + max{None} = 2 
    f(2) intervals: {I(2)}

f(3) = w(3) + max{f(1)} = 4 + 1 = 5 
    f(3) intervals = {I(1), I(3)}

f(4) = w(4) + max{f(1), f(2)} = 11 + f(1) = 11 + 3 = 14 
    f(4) intervals = {I(1), I(4)}

The maximum f(i) is f(4) which corresponds to the set of intervals {I(1), I(4)}, the optimal solution.

edited Mar 13 '17 at 09:16

answered Mar 12 '17 at 21:11

snakile

52,936
62
169
241

thank you. I used a binary search to find max{f(j) | b_j < a_i} – Moro Silverio Mar 13 '17 at 20:25
1

Everything is ok, your explanation helped me a lot, I have already coded the problem and got Accepted I can submit the code here, it may help other people in the future – Moro Silverio Mar 14 '17 at 03:18
i think, it is not O(k log(k)), finding max{f(j) | b_j < a_i} requires O(k) in worst case – Mukhtar Bimurat Jan 02 '21 at 18:03
@MukhtarBimurat: yes, but to know in which order you should compute the values f (in which order you consider the intervals), I don't see how to do it without having them sorted? – a3nm Feb 26 '22 at 23:13
One calculation is wrong: `f(3) = w(3) + max{f(1)} = 4 + 1 = 5` needs to be `4 + 3 = 7` – perotom May 09 '23 at 08:43
If you try to 3rd example this doesn't work. f(i) is not the maximum. You need to `max` one last time to get the optimal solution. Am I missing something? – perotom May 10 '23 at 06:27

Ivan Gritsenko · Answer 2 · 2017-03-12T21:03:03.943

There seems to be a O(k * log(k)) solution. It can be achieved with segment tree data structure.

We may at first populate some endPos array of segment endings, sort it. Memorise for each of the segments corresponding endPos index. For this let endPosIdx be such array that endPosIdx_j will store an index in endPos where the j-th segment ends.

Next we will introduce a segment tree. It will process the following requests:
1. getMax(i) - get maximum value on the range [0, i].
2. update(i, value) - update maximum at i-th position with value.
i is and index in endPos array. Calling getMax(i) we ask for what maximum cover can we achieve if non of the segments ends after endPos_i. Calling update(i, value) we say that now there exists a cover with length value ending at endPos_i.

Sort all segments in increasing order by their starting position a_j. Process them in that order. The gist is to find the largest cover if we will certainly take current segment in resulting set. Current cover will equal to the sum of the length of current segment and max cover of the segments ending before current. Let j be the index of current segment (they are sorted by start pos). Let i then be such max index that endPos_i ≤ a_j (i may be found from j by binary search). Then we can find

cover_j = length_j + getMax(i)

Next we should update segment tree calling update(endPosIdx_j, cover_j) and proceed to the next segment.

After processing of all the segments the solution can be found by calling getMax(size(endPos)).

Max coverage disjoint intervals

2 Answers2

Linked