How many contiguous subarrays with max. n unique numbers

Question

I found a programming challenge online and was wondering if there is a more efficient solution to this.

The problem: You are given a list of n numbers along with a number X which refers to the maximum number of different numbers that can be contained in a contiguous sub-array. We need to count all such contiguous sub-arrays which satisfy the condition imposed by X.

Input On the first row are two numbers n and x; the amount of numbers and the maximum number of unique numbers in the subarray.

Example:

5 2
1 2 3 1 1
ans = 10
explanation: ([1],[2],[3],[1],[1],[1,2],[2,3],[3,1],[1,1],[3,1,1])

My approach Loop through all subarrays of the list using two loops and count the number of unique numbers in the concerned subarray (using a set). Surely, there must be a more efficient way to calculate this? Sorry if this question doesn't belong here, feel free to edit it.

EDIT: nellex's corrected code that sometimes gives the wrong answer

int main() {
    int n, x;
    cin >> n >> x;

    vector<int> a;
    for (int i = 1; i <= n; i++) {
        int b;
        cin >> b;
        a.push_back(b);
    }

    int ans = 0, end = 1;
    set<int> uniq;
    map<int, int> freq;
    for (int start = 0; start < n; start++) {
        cout << start << " and end=" << end << endl;
        while (uniq.size() <= x && end < n) {
            if (uniq.size() == x && freq[a[end]] == 0) {
                break;
            }
            uniq.insert(a[end]);
            freq[a[end]]++;
            end++;
        }
        cout << "added " << end << " - " << start << " to ans" << endl;
        ans += end - start;
        freq[a[start]]--;
        if (freq[a[start]] == 0) {
            uniq.erase(a[start]);
        }
    }
    cout << ans;
}

EDIT: 1st test cases constraints:

1≤k≤n≤100

1≤xi≤10

The largest constraints:

1≤k≤n≤5⋅10^5

1≤xi≤10^9

i'm confused. what is the question? why is x 2 when there are three digits? why does the answer contain `[1]` three times if we want unique numbers?, ... — Thomas, Oct 01 '18 at 11:22
In the example, the question is how many of the subarrays of the list [1,2,3,1,1] contain max. 2 unique numbers. Sorry if I was unclear. — Quiti, Oct 01 '18 at 11:24
Why do you think this is inefficient what you are doing. That said, I removed the C++ tag since this is not a question bout C++ in any way. — Ulrich Eckhardt, Oct 01 '18 at 13:56
@UlrichEckhardt because with big test cases (e.g 10^5) my code exceeds the time limit. — Quiti, Oct 01 '18 at 14:26
@Quiti I see you've modified the code a bit incorrectly. If you're indexing the array at 0, then please set `int end = 0` during the initialization. Also for the given constraint, you need to use `long long` for answer. — Nilesh, Oct 02 '18 at 07:43
@Quiti I've edited the code in my comment to suit your need of indexing the array at 0 and also to suit the given constraint. Sharing the link to the new code: https://ideone.com/v2CdZO — Nilesh, Oct 02 '18 at 07:54

Nilesh · Accepted Answer · 2018-10-02T07:51:19.697

2

A sliding window approach will fit as a better solution to this problem which will enable us to solve it in O(n*log(n)) by using a Set and a Map: https://ideone.com/v2CdZO

int main() {
    int n, x;
    cin >> n >> x;

    vector<int> a(n);
    for(int i = 0; i < n; i++) cin >> a[i];

    int end = 0;
    long long ans = 0;

    set<int> uniq;
    map<int, int> freq;
    for(int start = 0; start < n; start++) {
        while(uniq.size() <= x && end < n) {
            if(uniq.size() == x && freq[a[end]] == 0) {
                break;
            }
            uniq.insert(a[end]);
            freq[a[end]]++;
            end++;
        }
        ans += end - start;
        freq[a[start]]--;
        if(freq[a[start]] == 0) {
            uniq.erase(a[start]);
        }
    }
    cout << ans;
}

The algorithm works in the manner that for every element defined by the index start that is, a[start], we try to find the largest sub-array starting at start such that the unique elements in the sub-array is <= x. If the size of the identified sub-array is S, then we know that the element a[start] will be a part of S sub-arrays starting at index start.

If we do a dry run for the given example,

when start = 1, we'll generate the sub-arrays {[1], [1, 2]}
when start = 2, we'll generate the sub-arrays {[2], [2, 3]}
when start = 3, we'll generate the sub-arrays {[3], [3, 1], [3, 1, 1]}
when start = 4, we'll generate the sub-arrays {[1], [1, 1]}
when start = 5, we'll generate the sub-arrays {[1]}

edited Oct 02 '18 at 07:51

answered Oct 01 '18 at 11:48

Nilesh

1,388
6
17

I'm still wondering, how do you declare int a[n+1]? Visual Studio is saying "expression must have a constant value". Otherwise, thank you. – Quiti Oct 01 '18 at 13:29
I guess it was introduced along with the C++11 standards, not quite sure though. Also, I realized that there was a bug in my code which I've fixed now. You might want to take a look at it again. My apologies =/ – Nilesh Oct 01 '18 at 13:53
Still getting the out of range error, are you sure you fixed it? I really can't seem to figure out where it is coming from... – Quiti Oct 01 '18 at 13:59
And why does the start begin from 1? I like the idea, but the code seems a bit incomplete. – Quiti Oct 01 '18 at 15:06
@Quiti can you share the test case for which the program is failing? – Nilesh Oct 01 '18 at 15:14
5 2 1 2 3 1 1 gives 11, the right answer is 10. – Quiti Oct 01 '18 at 16:10
That's unusual. I tried the case that you mentioned locally and as well on the online compiler and it gives the output as 10. You can check it here https://ideone.com/ITvXCD Can you share the code that you're trying to run? – Nilesh Oct 01 '18 at 21:46
got it working now, but it still can't pass some test cases I'm unable to see. "Wrong answer". Think I'm gonna debug it a little more today. – Quiti Oct 02 '18 at 07:11
Can you give me the constraints on n and x? The largest possible answer for an array of length n is (n*(n+1))/2, so if the range of n is 10^5, consider using `long long` for the answer `ans` variable. Also, if x = 0 is possible, then you might return 0 for such a case right away. Let me know if this helps. – Nilesh Oct 02 '18 at 07:36

גלעד ברקן · Answer 2 · 2018-10-02T09:19:01.830

We can solve this in O(n) time by keeping two pointers, p_l and p_r, both of which advance up the array, while updating a frequency count, h[e], for each element we encounter as well as the current number of unique items, k.

For example:

5 2
1 2 3 1 1

Let's look at each iteration

k = 0
h = {}
total = 0
p_r = -1
p_l = -1

1:   p_r = 0
     h = {1:1}
     k = 1
     total = 1

2:   p_r = 1
     h = {1:1, 2:1}
     k = 2
     total = 1 + 2 = 3

3:   p_r = 2
     h = {1:1, 2:1, 3:1}
     k = 3

  => move p_l until k equals X:
     p_l = 0
     h = {1:1-1=0, 2:1, 3:1}
     k = 3 - 1 = 2

     total = 3 + 2 = 5

1:   p_r = 3
     h = {1:1, 2:1, 3:1}
     k = 3

  => move p_l until k equals X:
     p_l = 1
     h = {1:1, 2:1-1=0, 3:1}
     k = 3 - 1 = 2

     total = 5 + 2 = 7

1:   p_r = 4
     h = {1:2, 2:0, 3:1}
     k = 2
     total = 7 + 3 = 10

How many contiguous subarrays with max. n unique numbers

2 Answers2