11

A friend gave me a puzzle that he says can be solved in better than O(n^3) time.

Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.

I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).

Is there a better solution?

kwiqsilver
  • 1,017
  • 2
  • 11
  • 24
  • 3
    Your solution is greedy and is not always true(I can give examples where it fails), but also it can be implemented with better complexity. – Ivaylo Strandjev Jan 31 '12 at 09:17
  • @izomorphius It is greedy, by intent. But I haven't been able to prove it not optimal. Any idea what the solution with better complexity is? – kwiqsilver Jan 31 '12 at 09:29
  • 1
    I know it is greedy, but it is not correct. Here is an example: intervals: (0, 2) , (1, 4), (3, 6) , (5, 8), (7, 10). On the first step intervals (1,4), (3,6), (5,8) all cover two others so you can choose either of them but the only best answer is if you choose (1,4) and (5,8). There are examples where the interval covering most intervals is uniquely identified, but still best solution does not include it, but they are a bit harder to think of. If you insist I will try to give one. – Ivaylo Strandjev Jan 31 '12 at 09:48
  • seems similar to **1.2 Selecting the Right Jobs** from [The Algorithm Design Manual](http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1849967202/ref=sr_1_1?ie=UTF8&qid=1328003679&sr=8-1) you can check out google's preview of the book – Nick Dandoulakis Jan 31 '12 at 09:55
  • I did create a graph where that algorithm provided a non-optimal solution, so it's out. But thinking about it a bit more, this looks like a minimal dominating set, which according to wikipedia is NP-C, so an O(n^3), or better, solution would not be possible. – kwiqsilver Jan 31 '12 at 10:07
  • This is an instance of minimal dominating set, but the fact that we are dealing with overlapping intervals only restricts the problem enough to allow a polynomial-time solution. This is similar to how 2-SAT is a special case of 3-SAT, and is in P while 3-SAT is NP-complete. – interjay Jan 31 '12 at 10:53

2 Answers2

12

Let A be the set of jobs which we haven't overlapped yet.

  1. Find the job x in A which has the minimal end time (t).
  2. From all jobs whose start time is less than t: pick the job j with the maximum end time.
  3. Add j to the output set.
  4. Remove all jobs which overlap j from A.
  5. Repeat 1-4 until A is empty.

A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).

The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • This algorithm seems to fail with the interval set {[0,2], [1,4], [3,10], [5, 6], [7,8]} (example from [this question](http://stackoverflow.com/q/26170904/535871)). It generates the covering set {[1, 4], [5, 6], [7, 8]} whereas there are two smaller covering sets: {[0, 2], [3, 10]} and {[1, 4], [3, 10]}. – Ted Hopp Oct 02 '14 at 23:18
  • @TedHopp The algorithm works correctly. It generates the set {[1,4], [3,10]}. Note that in step 2, any job can be picked, even if it isn't in A. – interjay Oct 02 '14 at 23:50
1

We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.

To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).

We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).

The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.

Nabb
  • 3,434
  • 3
  • 22
  • 32