Computing the smallest positive integer not covered by any of a set of intervals

Question

Someone posted this question here a few weeks ago, but it looked awfully like homework without prior research, and the OP promptly removed it after getting a few downvotes.

The question itself was rather interesting though, and I've been thinking about it for a week without finding a satisfying solution. Hopefully someone can help?

The question is as follows: given a list of N integer intervals, whose bounds can take any values from 0 to N³, find the smallest integer i such that i does not belong to any of the input intervals.

For example, if given the list [3,5] [2,8] [0,3] [10,13] (N = 4) , the algorithm should return 9.

The simplest solution that I can think of runs in O(n log(n)), and consists of three steps:

Sort the intervals by increasing lower bound
- If the smallest lower bound is > 0, return 0;
- Otherwise repeatedly merge the first interval with the second, until the first interval (say [a, b]) does not touch the second (say [c, d]) — that is, until b + 1 < c, or until there is only one interval.
Return b + 1

This simple solution runs in O(n log(n)), but the original poster wrote that the algorithm should run in O(n). That's trivial if the intervals are already sorted, but the example that the OP gave included unsorted intervals. I guess it must have something to do with the N³ bound, but I'm not sure what... Hashing? Linear time sorting? Ideas are welcome.

Here is a rough python implementation for the algorithm described above:

def merge(first, second):
    (a, b), (c, d) = first, second
    if c <= b + 1:
        return (a, max(b, d))
    else:
        return False

def smallest_available_integer(intervals):
    # Sort in reverse order so that push/pop operations are fast
    intervals.sort(reverse = True)

    if (intervals == [] or intervals[-1][0] > 0):
        return 0

    while len(intervals) > 1:
        first = intervals.pop()
        second = intervals.pop()

        merged = merge(first, second)
        if merged:
            print("Merged", first, "with", second, " -> ", merged)
            intervals.append(merged)
        else:
            print(first, "cannot be merged with", second)
            return first[1] + 1

print(smallest_available_integer([(3,5), (2,8), (0,3), (10,13)]))

Output:

Merged (0, 3) with (2, 8)  ->  (0, 8)
Merged (0, 8) with (3, 5)  ->  (0, 8)
(0, 8) cannot be merged with (10, 13)
9

It seems to be an error here : `whose bounds can take any values from 0 to N³` (0 instead of 1) — Vincent, Oct 10 '13 at 16:06
"Constant time sorting?" There is *no* such a thing. Sorting a sequence of `n` values, even in the best possible case, takes `O(n)` time. You have to do at least `n-1` comparison to check that the elements are sorted. — Bakuriu, Oct 10 '13 at 16:20
You can sort N integers in the range (0,N^3) in time O(N) (assuming that each comparison takes constant time), using base-N radix sort. — mrip, Oct 10 '13 at 16:22
@Mrip, thanks! I had been thinking about a bucket sort, which would have required assuming a uniform distribution on the input... Awesome answer. — Clément, Oct 10 '13 at 23:18

score 8 · Accepted Answer · answered Oct 10 '13 at 17:25

Elaborating on @mrip's comment: you can do this in O(n) time by using the exact algorithm you've described but changing how the sorting algorithm works.

Typically, radix sort uses base 2: the elements are divvied into two different buckets based on whether their bits are 0 or 1. Each round of radix sort takes time O(n), and there is one round per bit of the largest number. Calling that largest nunber U, this means the time complexity is O(n log U).

However, you can change the base of the radix sort to other bases. Using base b, each round takes time O(n + b), since it takes time O(b) to initialize and iterate over the buckets and O(n) time to distribute elements into the buckets. There are then log_b U rounds. This gives a runtime of O((n + b)log_b U).

The trick here is that since the maximum number U = n³, you can set b = n and use a base-n radix sort. The number of rounds is now log_n U = log_n n³ = 3 and each round takes O(n) time, so the total work to sort the numbers is O(n). More generally, you can sort numbers in the range [0, n^k) in time O(kn) for any k. If k is a fixed constant, this is O(n) time.

Combined with your original algorithm, this solves the problem in time O(n).

Hope this helps!

Thanks, that's a very pretty solution indeed! – Clément Oct 10 '13 at 23:19 — Clément, Oct 10 '13 at 23:19

brm · Answer 2 · 2013-10-12T09:34:30.197

0

Another idea would be use the complement of these intervals somehow. Suppose C() gives you the complement for an interval, for example C([3,5]) would be the integer numbers smaller than 3 and those larger than 5. If the maximum number is N^3, then using modulo N^3+1 you could even represent this as another interval [6,(N^3+1)+2].

If you want a number that does not belong to any of the original intervals, this same number should be present in all of the complements of these intervals. It then comes down to writing a function that can calculate the intersection of any two such 'complement intervals'.

I haven't made an implementation of this idea, since my pen and paper drawings indicated that there were more cases to consider when calculating such an intersection than I first imagined. But I think the idea behind this is valid, and it would result in an O(n) algorithm.

EDIT

On further thought, there is a worst case scenario that makes things more complex than I originally imagined.

edited Oct 12 '13 at 09:34

answered Oct 10 '13 at 18:43

brm

3,706
1
14
14

If you represent these complements as intervals themselves, to find the intersection of two you have to consider a number of cases to check how they are positioned relative to each other. But this can be done based on their start and end points, so each intersection calculation will take an amount of time that's not dependent on the length of the input. Since you'll just need to process each item in the input list of length N, that should give an O(N) complexity. – brm Oct 11 '13 at 20:35
I now realise that there actually is a case which be more complex than I originally thought. Not entirely sure what this does with the complexity but it's probably not O(n) anymore (at least not the way I wanted to do the intersection) – brm Oct 12 '13 at 09:36
I think that one of the problems is that the intersection of two complements is not the complement of an interval; so the intersections will grow increasingly complex, and in the end require O(n) comparisons for each intersection, hence O(n^2) in total. – Clément Oct 12 '13 at 21:55

Computing the smallest positive integer not covered by any of a set of intervals

2 Answers2