0

In the weighted interval scheduling problem, one has a sequence of intervals {i_1, i_2, ..., i_n} where each interval i_x represents a contiguous range (in my case, a range of non-negative integers; for example i_x = [5,9)). The usual goal is to set the weight of each interval equal to its width, and then determine the subset of non-overlapping intervals whose total weight is a maximum. An excellent solution is given at the link I just provided.

I have implemented the solution in C++, starting with the algorithm provided at the given link (which is written in Python in a GitHub repository here.

However, the current solution at the link given - and everywhere else I have seen it discussed - only provides a way to capture a single maximal fit. Of course, in some cases there can be multiple maximal fits, each with the same total (globally maximal) weight.

I have implemented a "brute force" approach to capturing all maximal fits, which I describe below.

However, before discussing the specific details of the brute force approach I've used, the key problem in my brute force approach that I'd like resolved is that my brute-force approach captures many false positives, in addition to the true maximal fits. It is not necessary to delve into the specifics of my brute-force approach if you can just answer the following question:

I'd like to know what is the (or a) most efficient enhancement to the basic O(n log(n)) solution that supports capturing all maximal fits, rather than just one maximal fit (but if anyone can answer how to avoid false positives, that will also satisfy me).

I am making no progress on this, and the brute force approach I'm using starts to explode unmanageably in cases where there are in excess of thousands (perhaps less) maximal fits.

Thank you!


Details of the brute force approach I am using, only if interested or useful:

There is a single line of code in the existing source code I've linked above that is responsible for the fact that the algorithm selects a single maximal fit, rather than proceeding down a path where it could capture all maximal fits. Click here to see that line of code. Here it is:

if I[j].weight + OPT[p[j]] > OPT[j - 1]:

Notice the > (greater than sign). This line of code successfully guarantees that any interval combination with a higher total weight than any other interval combination for the given sub-problem is kept. By changing > to >=, it is possible to capture scenarios where the current interval set under consideration has an equal total weight to the highest previous total weight, which would make it possible to capture all maximal fits. I wish to capture this scenario, so in my C++ migration I used the >= and, in the case where equality holds, I proceed down both paths in the fork via a recursive function call.

Below is the C++ code for the (critical) function that captures all optimum interval sets (and weights) for each sub-problem (noting that the final solution is obtained at the last index where the sub-problem corresponds to the entire problem).

Please note that OPTs is a list of all potential solutions (maximal interval sets) (i.e., each element of OPTs is itself a single complete solution of all sub-problems consisting of a set of intervals and a corresponding weight for every sub-problem), while OPT is used to describe a single such complete solution - a potential maximal fit with all intervals used to construct it, one for each sub-problem.

For the standard solution of the weighted interval scheduling problem that I've indicated above, the solution obtained is just OPT (a single one, not a list).

The RangeElement type in the code below is simply metadata unrelated to the problem I'm discussing.

RangesVec contains the set of intervals that is the input to the problem (properly sorted by ending value). PreviousIntervalVec corresponds to compute_previous_intervals discussed at the link above.

(Note: For anybody who is looking at the Python code linked above, please note that I think I have found a bug in it related to saving intervals in the maximal set; please see here for a comment about this bug, which I've fixed in my C++ code below.)

Here is my 'brute-force' implementation that captures all maximal fits. My brute force approach also captures some false positives that need to be removed at the end, and I would be satisfied with any answer that gives a most efficient approach to exclude false positives but otherwise uses an algorithm equivalent to the one below.

void CalculateOPTs(std::vector<std::pair<INDEX_TYPE, std::vector<RangeElement const *>>> & OPT, size_t const starting_index = 0)
{
    ++forks;
    for (size_t index = starting_index; index < RangesVec.size(); ++index)
    {

        INDEX_TYPE max_weight_to_be_set_at_current_index {};

        INDEX_TYPE max_weight_previous_index {};
        INDEX_TYPE max_weight_previously_calculated_at_previous_interval {};

        INDEX_TYPE current_index_weight = RangesVec[index]->range.second - RangesVec[index]->range.first;

        if (index > 0)
        {
            max_weight_previous_index = OPT[index - 1].first;
        }

        size_t previous_interval_plus_one = PreviousIntervalVec[index];
        if (previous_interval_plus_one > 0)
        {
            max_weight_previously_calculated_at_previous_interval = OPT[previous_interval_plus_one - 1].first;
        }

        INDEX_TYPE weight_accepting_current_index = current_index_weight + max_weight_previously_calculated_at_previous_interval;
        INDEX_TYPE weight_rejecting_current_index = max_weight_previous_index;

        max_weight_to_be_set_at_current_index = std::max(weight_accepting_current_index, weight_rejecting_current_index);

        //if (false && weight_accepting_current_index == weight_rejecting_current_index)
        if (weight_accepting_current_index == weight_rejecting_current_index)
        {

            // ***************************************************************************************** //
            // Fork!
            // ***************************************************************************************** //

            // ***************************************************************************************** //
            // This is one of the two paths of the fork, accessed by calling the current function recursively
            // ***************************************************************************************** //

            // There are two equal combinations of intervals with an equal weight.
            // Follow the path that *rejects* the interval at the current index.

            if (index == 0)
            {
                // The only way for the previous weight to equal the current weight, given that the current weight cannot be 0,
                // is if previous weight is also not 0, which cannot be the case if index == 0 
                BOOST_THROW_EXCEPTION(std::exception((boost::format("Logic error: Forking a maximal fitting path at index == 0")).str().c_str()));
            }

            std::vector<std::pair<INDEX_TYPE, std::vector<RangeElement const *>>> newOPT = OPT;
            OPTs.emplace_back(newOPT);
            OPTs.back().push_back(std::make_pair(weight_rejecting_current_index, std::vector<RangeElement const *>())); // std::max returns first value if the two values are equal; so here create a fork using the second value
            OPTs.back()[index].second = OPTs.back()[index-1].second; // The current index is being rejected, so the current set of intervals remains the same for this index as for the previous
            CalculateOPTs(OPTs.back(), index + 1);

        }

        // ***************************************************************************************** //
        // If we forked, this is the other path of the fork, which is followed after the first fork, above, exits.
        // If we didn't fork, we proceed straight through here anyways.
        // ***************************************************************************************** //

        OPT.push_back(std::make_pair(max_weight_to_be_set_at_current_index, std::vector<RangeElement const *>()));

        if (max_weight_to_be_set_at_current_index == weight_accepting_current_index)
        {
            // We are accepting the current interval as part of a maximal fitting, so track it.
            //
            // Note: this also works in the forking case that hit the previous "if" block,
            // because this code represents the alternative fork.
            //
            // We here set the intervals associated with the current index
            // equal to the intervals associated with PreviousIntervalVec[index] - 1,
            // and then append the current interval.
            //
            // If there is no preceding interval, then leave the "previous interval"'s
            // contribution empty (from the line just above where an empty vector was added),
            // and just append the current interval (as the first).
            if (previous_interval_plus_one > 0)
            {
                OPT.back().second = OPT[previous_interval_plus_one - 1].second;
            }
            OPT.back().second.push_back(RangesVec[index]); // We are accepting the current interval as part of the maximal set, so add the corresponding interval here
        }
        else
        {
            if (index == 0)
            {
                // If index is 0, we should always accept the current interval, not reject, so we shouldn't be here in that case
                BOOST_THROW_EXCEPTION(std::exception((boost::format("Logic error: Rejecting current interval at index == 0")).str().c_str()));
            }

            // We are rejecting the current interval, so set the intervals associated with this index
            // equal to the intervals associated with the previous index
            OPT.back().second = OPT[index - 1].second;
        }

    }
}
Dan Nissenbaum
  • 13,558
  • 21
  • 105
  • 181

1 Answers1

0

When there is an equal weight optimal subsolution, you need to add the next interval to every subsolution, I don't see this happening. The general form would look like this

function go(lastend){

    for(i=0;i<n;i++){
         if(interval[i].start>lastend){
             optimalsubs = go(interval[i].end)
             if optimalsubs.cost + interval[i].cost > optimal.cost {
                  for(os in optimalsubs){
                        os.add(interval[i])
                  }
                  optimal = optimalsubs
                  optimal.cost  = optimalsubs.cost + interval[i].cost
             }
             else if equal{
                for(os in optimalsubs){
                        os.add(interval[i])
                }
                optimal.append(optimalsubs)
             }
         }

    }
    return optimal
}
dfb
  • 13,133
  • 2
  • 31
  • 52
  • Thanks! As I'm trying to parse this, it seems that due to the `return` statement being inside the `for` loop, that only the first interval will ever be considered. Perhaps I'm misunderstanding, or perhaps the `return` statement should be outside the `for` loop? My apologies if this is a basic or trivial question. – Dan Nissenbaum Jun 26 '14 at 17:20
  • Also, keep in mind that there could be an exponential number of valid solutions. I.e., if there are two intervals of length 1 for starting at i for i=1 to n, there will be 2^n solutions – dfb Jun 26 '14 at 17:22
  • Thanks again. To help me parse this, can you please tell me if the intervals in this algorithm are assumed to be sorted by starting value, or by ending value. (In my code, they're sorted by ending value.) – Dan Nissenbaum Jun 26 '14 at 17:29
  • Note that if the intervals are sorted by ending value, I see a problem with the above code when the starting values of all intervals are the same (only the first interval seems to ever pass through the first `if` condition), and if the intervals are sorted by starting value, I see a problem when the ending values of all intervals are the same (again, only the first interval seems to ever pass through the first `if` condition). Am I right? I can follow up and explain my understanding (or misunderstading?) of why this is the case if you'd like! Thanks. – Dan Nissenbaum Jun 26 '14 at 18:22
  • They should be sorted by ending value as in your code. Lastend is intially 0 (or -1 if an interval can start at 0), so all the intervals should go through on the first recursive call. If all the start or end values are the same, only one interval can be selected, so this should work. Not sure why you think only the first interval passes – dfb Jun 26 '14 at 19:05
  • I appreciate your looking into it. I stand corrected. What I *mean* to say is that if all starting positions are the same, then clearly the line of code `optimalsubs = go(interval[i].end)` is a no-op (it remains unchanged). Therefore, effectively in this scenario the `for` loop is executed for only one cycle. Let us suppose there are only two intervals with the same starting point. The first interval (`i==0`) will be added to a single solution in `optimal`. Then, the second interval (`i==1`) will be appended, but this seems incorrect as they overlap. Again, thanks for checking. – Dan Nissenbaum Jun 26 '14 at 19:22
  • `optimal` is meant to be a list of possible optimal intervals, each element is one of those. We append a solution, not an interval each time. In your example, there will be two elements of `optimal`, which are single interval solutiosn for the problem – dfb Jun 26 '14 at 19:53
  • Great, thanks. In the code, I take it that `os` is a full solution (i.e., a set of intervals). Therefore, the line `for(os in optimalsubs) { os.add(interval[i]) }` seems to take any existing full solution, and append a new interval to the end. In the case of my example with two intervals with the same starting point, once the first interval is added as a single-interval solution when `i==0`, it seems to me that when `i==1` that previous solution has the second interval *appended* due to the for loop I've just noted, rather than a *new* solution created with just the second interval added. – Dan Nissenbaum Jun 26 '14 at 20:07
  • optimalsubs is the list of optimal solutions for recursive call. You add the current interval to every optimal subsolution to get a new set of optimal solutions. In your example, optimalsubs is just an empty solution. When `i==0`, optimal becomes a single solution with the first interval. When `i==1`, another solution is added to optimal. – dfb Jun 26 '14 at 20:13
  • I see now - that `optimal` is intended to be a *local* variable (I had assumed it was a global variable). I'll continue to study this. Thanks! – Dan Nissenbaum Jun 26 '14 at 20:19