8

Currently I'm reading "The Algorithm Design Manual" by Skiena (well, beginning to read)

He asks a problem he calls the "Movie Scheduling Problem":

Problem: Movie Scheduling Problem

Input: A set I of n intervals on the line.

Output: What is the largest subset of mutually non-overlapping intervals which can be selected from I?

Example: (Each dashed line is a movie, you want to find a set with the highest quantity of movies)

                      ----a---
-----b----    -----c---    ---d---
        -----e---  -------f---
            --g--  --h--

The algorithm I thought of to solve it was like this: I could throw out the "worst offender" (intersects with the most other movies) until there are no worst offenders (zero intersections). The only problem I see is that if there is a tie (say two different movies each intersect with 3 other movies) could it matter which one I throw out?

Basically I'm wondering how I go about turning the idea into "math" and how to prove it correct/incorrect.

RBarryYoung
  • 55,398
  • 14
  • 96
  • 137
David Crowe
  • 83
  • 1
  • 3
  • Are we trying to make as many movies as we can, or are we trying to fill up as much time as we can? Is it better to have 6 movies that would run for 5 hours, or 5 movies that would run for 6 hours in the same timespan? – Petr Janeček Sep 13 '13 at 00:06
  • We're trying to make the most movies. – David Crowe Sep 13 '13 at 00:07
  • why are there multiple movies on some lines in the input (your example)? would it matter if each movie was on its own line? that simplifies thinking about it, otherwise it gives the impression that some movies are grouped together on the same line. – necromancer Sep 13 '13 at 00:10
  • 1
    Please [edit] the title of your question to make it more specific. What possible use could "Is this algorithm correct?" be to future readers when they're searching here? Your question is pretty vague in content as well: "how I go about turning the idea into "math" and how to prove it correct/incorrect" isn't really a specific question that can be answered here. I'm actually leaning toward it being off topic here and more appropriate for [Programmers](http://programmers.stackexchange.com) as more of a theoretical question, as there's no code involved. – Ken White Sep 13 '13 at 00:16
  • It's also a [duplicate](http://stackoverflow.com/q/18412923/2336725). Well, maybe not the exact question. But it's the same problem and book. – Teepeemm Sep 13 '13 at 00:29

3 Answers3

12

The algorithm is incorrect. Let's consider the following example:

Counterexample

           |----F----|       |-----G------| 

        |-------D-------|  |--------E--------|

|-----A------|    |------B------|    |------C-------|

You can see that there is a solution of size at least 3 because you can pick A, B and C.

Firstly, let's count, for each interval the number of intersections:

A = 2    [F, D]
B = 4    [D, F, E, G]
C = 2    [E, G]
D = 3    [A, B, F]
E = 3    [B, C, G]
F = 3    [A, B, D]
G = 3    [B, C, E]

Now consider a run of your algorithm. In the first step we delete B because it intersects with the most number of invervals and we get:

           |----F----|       |-----G------| 

        |-------D-------|  |--------E--------|

|-----A------|                      |------C-------|

It's easy to see that now from {A, D, F} you can choose only one, because each pair intersects. The same case with {G, E, C}, so after deleting B, you can choose at most one from {A, D, F} and at most one from {G, E, C}, to get the total of 2, which is smaller than the size of {A, B, C}.

The conclusion is, that after deleting B which intersects with the most number of invervals, you can't get the maximum number of nonintersecting movies.

Correct solution

The problem is very well known and one solution is to pick the interval which ends first, delete all intervals intersecting with it and continue until there are no intervals to examine. This is an example of a greedy method and you can find or develop a proof that it's correct.

pkacprzak
  • 5,537
  • 1
  • 17
  • 37
  • What if fist I remove duplicates: If M1 and M2 have the same intersect set and intersect each other then they are interchangeable-- any solution that includes M1 could include M2 instead and vice versa. No solution could include both. So removing either from the set of intervals I will not affect the ability to find a best solution. IF: intersects(M1) union intersects(M2) = intersects(M1)-M2 = intersects(M2)-M1 THEN: I = (I - M1) – David Crowe Sep 13 '13 at 01:11
  • @DavidCrowe do you see in my counterexample two intervals with the same intersection set? – pkacprzak Sep 13 '13 at 01:14
  • D and F both have [A, B]. E and G both have [B, C]. – David Crowe Sep 13 '13 at 01:16
  • @DavidCrowe Ok, so you can add a new interval H which intersects only with D and B - add it after the end of F and before the begin of D. Then again B has the most number of intersections, and D and F have different intersections. You can do the same with B and C. – pkacprzak Sep 13 '13 at 01:24
  • Ok, if before removing duplicates we remove intervals that are completely contained by other intervals? So the flow would look like: 1. Remove SubIntervals, 2. Remove Duplicates, 3. Remove Worst Offenders – David Crowe Sep 13 '13 at 01:38
  • 1
    @DavidCrowe I'll think about it, but tomorrow :) – pkacprzak Sep 13 '13 at 01:43
2

This looks like a dynamic programming problem to me:

Define the following functions:

sched(t) = best schedule starting at time t
next(t) = set of movies that start next after time t
len(m) = length of movie m

next returns a set because there may be more than one movie that starts at the same time.

then sched should be defined as follows:

sched(t) = max { 1 + sched(t + len(m)), sched(t+1) } where m in next(t)

This recursive function selects a movie m from next(t) and compares the largest possible sets that either include or don't include m.

Invoke sched with the time of your first movie and you will get the size of the optimal set. Getting the optimal set itself just requires a little extra logic to remember which movies you select at each invocation.

I think this recursive (as opposed to iterative) algorithm runs in O(n^2) if you use memoization, where n is the number of movies.

It's correct, but I'd have to consult my algorithms textbook to give you an explicit proof, but hopefully this algorithm makes intuitive sense why it is correct.

mhess
  • 1,364
  • 14
  • 12
0
# go through the database and create a 2-D matrix indexed a..h by a..h.  Set each
# element of the matrix to 1 if the row index movie overlaps the column index movie.

mtx = []
for i in range(8):
    column = []
    for j in range(8):
        column.append(0)
    mtx.append(column)

# b <> e
mtx[1][4] = 1
mtx[4][1] = 1

# e <> g
mtx[4][6] = 1
mtx[6][4] = 1

# e <> c
mtx[4][2] = 1
mtx[2][4] = 1

# c <> a
mtx[2][0] = 1
mtx[0][2] = 1

# c <> f
mtx[2][5] = 1
mtx[5][2] = 1

# c <> g
mtx[2][6] = 1
mtx[6][2] = 1

# c <> h
mtx[2][7] = 1
mtx[7][2] = 1

# d <> f
mtx[3][5] = 1
mtx[5][3] = 1

# a <> f
mtx[0][5] = 1
mtx[5][0] = 1

# a <> d
mtx[0][3] = 1
mtx[3][0] = 1

# a <> h
mtx[0][7] = 1
mtx[7][0] = 1

# g <> e
mtx[4][7] = 1
mtx[7][4] = 1

# print out contstraints
for line in mtx:
    print line

# keep track of which movies are still allowed
allowed = set(range(8))

# loop through in greedy fashion, picking movie that throws out the least
# number of other movies at each step
best = 8
while best > 0:
    best_col = None
    best_lost = set()
    best = 8  # score if move does not overlap with any other
    # each step, only try movies still allowed
    for col in allowed:
        lost = set()
        for row in range(8):
            # keep track of other movies eliminated by this selection
            if mtx[row][col] == 1:
                lost.add(row)
        # this was the best of all the allowed choices so far
        if len(lost) < best:
            best_col = col
            best_lost = lost
            best = len(lost)
    # there was a valid selection, process
    if best_col > 0:
        print 'watch movie: ', str(unichr(best_col+ord('a')))
        for row in best_lost:
            # now eliminate the other movies you can't now watch
            if row in allowed:
                print 'throwing out: ', str(unichr(row+ord('a')))
                allowed.remove(row)
        # also throw out this movie from the allowed list (can't watch twice)
        allowed.remove(best_col)

# this is just a greedy algorithm, not guaranteed optimal!
# you could also iterate through all possible combinations of movies
# and simply eliminate all illegal possibilities (brute force search)
mikeTronix
  • 584
  • 9
  • 16