13

Suppose I have an interval (a,b), and a number of subintervals {(ai,bi)}i whose union is all of (a,b). Is there an efficient way to choose a minimal-cardinality subset of these subintervals which still covers (a,b)?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Alex Coventry
  • 68,681
  • 4
  • 36
  • 40
  • Are you looking for the smallest number of subintervals, or the set of subintervals that has the fewest elements (and hence, the fewest duplicates)? – Bill the Lizard Nov 15 '08 at 22:49

4 Answers4

18

A greedy algorithm starting at a or b always gives the optimal solution.

Proof: consider the set Sa of all the subintervals covering a. Clearly, one of them has to belong to the optimal solution. If we replace it with a subinterval (amax,bmax) from Sa whose right endpoint bmax is maximal in Sa (reaches furthest to the right), the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution, so it can be covered with no more subintervals than the analogous uncovered interval from the optimal solution. Therefore, a solution constructed from (amax,bmax) and the optimal solution for the remaining interval (bmax,b) will also be optimal.

So, just start at a and iteratively pick the interval reaching furthest right (and covering the end of previous interval), repeat until you hit b. I believe that picking the next interval can be done in log(n) if you store the intervals in an augmented interval tree.

Rafał Dowgird
  • 43,216
  • 11
  • 77
  • 90
  • Could you elaborate: "the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution"? – jfs Nov 16 '08 at 23:00
  • @JFS: Suppose that the optimal solution starts with an interval (ai,bi) that covers (a,bi) and leaves (bi,b) uncovered. From the definition of (amax,bmax) we have that bmax>=bi, so (bmax,b) is a subset (subinterval) of (bi,b). – Rafał Dowgird Nov 17 '08 at 07:17
  • 1
    "If we replace it with a subinterval (amax,bmax) from Sa...": Do you mean the sub-interval (amin, bmax)? And I'm not so sure about (bmax, b) either. – user183037 Apr 05 '12 at 17:57
  • @user183037 The 'max' is just an index for the whole interval. Interval 'foo' is (a_foo,b_foo). (b_max,b) is the remaining interval. – Rafał Dowgird Apr 05 '12 at 18:17
1

Sounds like dynamic programming.

Here's an illustration of the algorithm (assume intervals are in a list sorted by ending time):

//works backwards from the end
int minCard(int current, int must_end_after)
{
    if (current < 0)
        if (must_end_after == 0)
            return 0; //no more intervals needed
        else
            return infinity; //doesn't cover (a,b)
        
    if (intervals[current].end < must_end_after)
        return infinity; //doesn't cover (a,b)
   
    return min( 1 + minCard(current - 1, intervals[current].start),
                    minCard(current - 1, must_end_after) );
    //include current interval or not?
}

But it should also involve caching (memoisation).

Andrea Olivato
  • 2,450
  • 1
  • 18
  • 30
Artelius
  • 48,337
  • 13
  • 89
  • 105
  • `minCard()` is intended to get a minimal cardinality but the question asks for a subset with minimal cardinality. – jfs Nov 16 '08 at 22:45
  • It would just be an extension of this algorithm that also keeps track of which subset was used to form that optimum value. – Artelius Nov 17 '08 at 05:00
  • @Artelius What is the complexity of your algorithm ? – Sumeet_Jain Mar 19 '16 at 20:05
  • The complexity is O(n^2) in the worst case. For example when all intervals have an identical end point but different start points, all of those start points are going to propagate back to the beginning. The greedy approach is better. – Artelius Mar 21 '16 at 22:10
0

There are two cases to consider:
Case 1: There are no over-lapping intervals after the finish time of an interval. In this case, pick the next interval with the smallest starting time and the longest finishing time. (amin, bmax).
Case 2: There are 1 or more intervals overlapping with the last interval you're looking at. In this case, the start time doesn't matter because you've already covered that. So optimize for the finishing time. (a, bmax).

Case 1 always picks the first interval as the first interval in the optimal set as well (the proof is the same as what @RafalDowgrid provided).

greybeard
  • 2,249
  • 8
  • 30
  • 66
user183037
  • 2,549
  • 4
  • 31
  • 42
-2

You mean so that the subintervals still overlap in such a way that (a,b) remains completely covered at all points?

Maybe splitting up the subintervals themselves into basic blocks associated with where they came from, so you can list options for each basic block interval accounting for other regions covered by the subinterval also. Then you can use a search based on each sub-subinterval and at least be sure no gaps are left.
Then would need to search.. efficiently.. that would be harder.

Could eliminate any collection of intervals that are entirely covered by another set of smaller number and work the problem after the preprocessing.
Wouldn't the minimal for the whole be minimal for at least one half? I'm not sure.

Found a link to a journal but couldn't read it. :(

This would be a hitting set problem and be NP_hard in general.
Couldn't read this either but looks like opposite kind of problem.
Couldn't read it but another link that mentions splitting intervals up.
Here is an available reference on Randomized Algorithms for GeometricOptimization Problems.
Page 35 of this pdf has a greedy algorithm.
Page 11 of Karp (1972) mentions hitting-set and is cited alot.
Google result. Researching was fun but I have to go now.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
waynecolvin
  • 72
  • 1
  • 4