Please tell me the efficient algorithm of Range Mex Query

Question

I have a question about this problem.

Question

You are given a sequence a[0], a 1],..., a[N-1], and set of range (l[i], r[i]) (0 <= i <= Q - 1).
Calculate mex(a[l[i]], a[l[i] + 1],..., a[r[i] - 1]) for all (l[i], r[i]).

The function mex is minimum excluded value.
Wikipedia Page of mex function

You can assume that N <= 100000, Q <= 100000, and a[i] <= 100000.
O(N * (r[i] - l[i]) log(r[i] - l[i]) ) algorithm is obvious, but it is not efficient.

My Current Approach

#include <bits/stdc++.h>
using namespace std;
int N, Q, a[100009], l, r;
int main() {
    cin >> N >> Q;
    for(int i = 0; i < N; i++) cin >> a[i];
    for(int i = 0; i < Q; i++) {
        cin >> l >> r;
        set<int> s;
        for(int j = l; j < r; j++) s.insert(a[i]);
        int ret = 0;
        while(s.count(ret)) ret++;
        cout << ret << endl;
    }
    return 0;
}

Please tell me how to solve.

EDIT: O(N^2) is slow. Please tell me more fast algorithm.

Am I right that you know all the queries before processing the first query? I mean, the batch processing algorithm is ok, and you don't need on-line answers — alexeykuzmin0, Jan 13 '17 at 11:15
@Arunmu sqrt-decomposition is ok, and segment tree is ok, if you can solve for segtree. — square1001, Jan 13 '17 at 11:30
You say O(N^2) is slow... are you talking about the evaluation of one range or the evaluation of all ranges? For the latter, O(N^2) might not be that bad actually... — grek40, Jan 13 '17 at 11:58
@grek40 for all queries. But if N<=100000 and Q<=100000, it will be take so much computation time. — square1001, Jan 13 '17 at 11:59
If the constant factors aren't to bad, N^2 = 10 billion doesn't sound like a hell of computation time to me. Most of the time the problem arises when you think you have N^2 but you do something extra that is adding another level of complexity. So __if__ you really have an N^2 solution, you should measure before thinking of the worst. — grek40, Jan 13 '17 at 12:04
A modern processor is capable of some 100 billion IPS (don't take the numbers literally, IPS is a misleading concept... still saying *something* about possible performance. See https://en.wikipedia.org/wiki/Instructions_per_second#Timeline_of_instructions_per_second) — grek40, Jan 13 '17 at 12:10
@grek40 From the olympiad programming experience I can say that for most cases 10^8 to 3*10^8 operations per 2 seconds are a good estimation, so 10^10 may take several minutes. Not bad for one-time calculation, but what if this code should be run frequently? — alexeykuzmin0, Jan 13 '17 at 12:14
May I ask where this problem comes frome? Can you add a link? — Simon Kraemer, Jan 13 '17 at 12:32

score 8 · Accepted Answer · answered Jan 13 '17 at 13:32

Here's an O((Q + N) log N) solution:

Let's iterate over all positions in the array from left to right and store the last occurrences for each value in a segment tree (the segment tree should store the minimum in each node).
After adding the i-th number, we can answer all queries with the right border equal to i.
The answer is the smallest value x such that last[x] < l. We can find by going down the segment tree starting from the root (if the minimum in the left child is smaller than l, we go there. Otherwise, we go to the right child).

That's it.

Here is some pseudocode:

tree = new SegmentTree() // A minimum segment tree with -1 in each position
for i = 0 .. n - 1
    tree.put(a[i], i)
    for all queries with r = i
         ans for this query = tree.findFirstSmaller(l)

The find smaller function goes like this:

int findFirstSmaller(node, value)
    if node.isLeaf()
        return node.position()
    if node.leftChild.minimum < value
        return findFirstSmaller(node.leftChild, value)
    return findFirstSmaller(node.rightChild)

This solution is rather easy to code (all you need is a point update and the findFisrtSmaller function shown above and I'm sure that it's fast enough for the given constraints.

findFirstSmaller(l) takes a single argument, but you are saying return findFirstSmaller(node.leftChild, value) and findFirstSmaller(node, value). What is the value here? node or l is the left part of the range right? — Ganesh Chowdhary Sadanala, Feb 10 '22 at 13:44

alexeykuzmin0 · Answer 2 · 2017-01-13T12:48:30.537

Let's process both our queries and our elements in a left-to-right manner, something like

for (int i = 0; i < N; ++i) {
    // 1. Add a[i] to all internal data structures
    // 2. Calculate answers for all queries q such that r[q] == i
}

Here we have O(N) iterations of this loop and we want to do both update of the data structure and query the answer for suffix of currently processed part in o(N) time.

Let's use the array contains[i][j] which has 1 if suffix starting at the position i contains number j and 0 otherwise. Consider also that we have calculated prefix sums for each contains[i] separately. In this case we could answer each particular suffix query in O(log N) time using binary search: we should just find the first zero in the corresponding contains[l[i]] array which is exactly the first position where the partial sum is equal to index, and not to index + 1. Unfortunately, such arrays would take O(N^2) space and need O(N^2) time for each update.

So, we have to optimize. Let's build a 2-dimensional range tree with "sum query" and "assignment" range operations. In such tree we can query sum on any sub-rectangle and assign the same value to all the elements of any sub-rectangle in O(log^2 N) time, which allows us to do the update in O(log^2 N) time and queries in O(log^3 N) time, giving the time complexity O(Nlog^2 N + Qlog^3 N). The space complexity O((N + Q)log^2 N) (and the same time for initialization of the arrays) is achieved using lazy initialization.

UP: Let's revise how the query works in range trees with "sum". For 1-dimensional tree (to not make this answer too long), it's something like this:

class Tree
{
    int l, r;           // begin and end of the interval represented by this vertex
    int sum;            // already calculated sum
    int overriden;      // value of override or special constant
    Tree *left, *right; // pointers to children
}
// returns sum of the part of this subtree that lies between from and to
int Tree::get(int from, int to)
{
    if (from > r || to < l) // no intersection
    {
        return 0;
    }
    if (l <= from && to <= r) // whole subtree lies within the interval
    {
        return sum;
    }
    if (overriden != NO_OVERRIDE) // should push override to children
    {
        left->overriden = right->overriden = overriden;
        left->sum = right->sum = (r - l) / 2 * overriden;
        overriden = NO_OVERRIDE;
    }
    return left->get(from, to) + right->get(from, to); // split to 2 queries
}

Given that in our particular case all queries to the tree are prefix sum queries, from is always equal to 0, so, one of the calls to children always return a trivial answer (0 or already computed sum). So, instead of doing O(log N) queries to the 2-dimensional tree in the binary search algorithm, we could implement an ad-hoc procedure for search, very similar to this get query. It should first get the value of the left child (which takes O(1) since it's already calculated), then check if the node we're looking for is to the left (this sum is less than number of leafs in the left subtree) and go to the left or to the right based on this information. This approach will further optimize the query to O(log^2 N) time (since it's one tree operation now), giving the resulting complexity of O((N + Q)log^2 N)) both time and space.

Not sure this solution is fast enough for both Q and N up to 10^5, but it may probably be further optimized.

Hope my answer is understandable. Please comment if something is unclear. — alexeykuzmin0, Jan 13 '17 at 12:16
Line-sweep algorithm. I don't have time today, so I'll check tomorrow. — square1001, Jan 13 '17 at 12:58
@square1001 Yes, scanning line + 2-dimensional range tree + binary search or customization of the tree — alexeykuzmin0, Jan 13 '17 at 13:01

Please tell me the efficient algorithm of Range Mex Query

2 Answers2

Linked