Number of occurrences of each distinct integer in given ranges for an array

Question

Given an array of n integers (n <= 1e6) [a0, a1, a2, ... an-1] (a[i] <= 1e9) and multiple queries. In each query 2 integers l and r (0 <= l <= r <= n-1) are given and we need to return the count of each distinct integer inside this range (l and r inclusive).

I can only come up with a brute force solution to iterate through the complete range for each query.

d={}
for i in range(l, r+1):
    if arr[i] not in d:
        d[arr[i]]=0
    d[arr[i]]+=1

For example:

Array is [1, 1, 2, 3, 1, 2, 1]

Query 1: l=0, r=6, Output: 4, 2, 3 (4 for 4 1's, 2, for 2 2's and 1 for 1 3)
Query 2: l=3, r=5, Output: 1, 1, 1

Edit- I came up with something like this but still its complexity is pretty high. I think because of that insert operation.

const ll N = 1e6+5;
ll arr[N];
unordered_map< ll, ll > tree[4 * N];
int n, q;

void build (ll node = 1, ll start = 1, ll end = n) {
    if (start == end) {
        tree[node][arr[start]] = 1;
        return;
    }
    ll mid = (start + end) / 2;
    build (2 * node, start, mid);
    build (2 * node + 1, mid + 1, end);
    for (auto& p : tree[2 * node]) {
        ll x = p.ff;
        ll y = p.ss;
        tree[node][x] += y;
    }
    for (auto& p : tree[2 * node + 1]) {
        ll x = p.ff;
        ll y = p.ss;
        tree[node][x] += y;
    }
}

vector< ll > query (ll node, ll l, ll r, ll start = 1, ll end = n) {
    vector< ll > ans;
    if (end < l or start > r) return ans;
    if (start >= l and end <= r) {
        for (auto p : tree[node]) {
            ans.push_back (p.ss);
        }
        return ans;
    }
    ll mid = (start + end) / 2;
    vector< ll > b = query (2 * node, l, r, start, mid);
    ans.insert (ans.end (), b.begin (), b.end ());
    b = query (2 * node + 1, l, r, mid + 1, end);
    ans.insert (ans.end (), b.begin (), b.end ());
    return ans;
}

I tried to use a segment tree but the best I can come up with is the count of distinct integers in the given range but nothing about the count of each distinct integer in the given range. — Farhan Tahir, Jun 14 '19 at 21:08

score 0 · Answer 1 · answered Jun 15 '19 at 00:53

You can use a binary index tree as described here. Rather than storing range sums in the nodes, store maps from values to counts for the respective ranges.

Now query the tree with input x to find a map for representing the frequencies of occurrence of each element in the corresponding index prefix [1..i]. This will require merging O(log n) maps.

Now you can do two queries: one for l-1 and another for r. "Subtract" the former result map from the latter. The map subtraction is entry-wise. I'll let you work out the details.`

The time for each query will be O(k log n) where k is the map size. This will be at most the number of distinct elements in the input array.

גלעד ברקן · Answer 2 · 2019-06-15T02:23:10.133

0

It sounds like this might be a candidate for how we arrange the queries. Assuming both the number of queries and length of input are on the order of n, similarly to this post, we can bucket them according to floor(l / sqrt(n)) and sort each bucket by r. Now we have sqrt(n) buckets.

Each bucket's q queries will have at most O(q * sqrt(n)) changes due to each movement in l and at most O(n) changes due to the gradual change in r (since we sorted each bucket by r, that side of the interval only increases steadily as we process the bucket).

Processing the changes on the right side of all the intervals in one bucket is bound at O(n) and we have sqrt(n) buckets so that's O(n * sqrt(n) for the right side. And since the number of all qs is O(n) (assumed) and each one requires at most O(sqrt(n)) changes on the left side, the changes for the left side are also O(n * sqrt(n)).

Total time complexity would therefore be O(n * sqrt(n) + k), where k is the total numbers output. (The updated data structure could be a hashmap that also allows for iteration on its current store.)

edited Jun 15 '19 at 02:23

answered Jun 15 '19 at 02:16

גלעד ברקן

23,602
3
25
61

Is it possible to answer queries online? As https://codeforces.com/problemset/problem/86/D this question has been asked in code forces and all the solution do almost what you are trying to say. – Farhan Tahir Jun 15 '19 at 04:18
@FarhanTahir I don't see how we could use this method with online queries. But isn't the codeforces problem presenting all queries in advance? (There's discussion [here](https://codeforces.com/blog/entry/59346).) – גלעד ברקן Jun 15 '19 at 04:48
Yes but I need to solve the queries online. Right now I am trying to build a segment tree having unordered_map as nodes. I guess it may exceed the memory limit. – Farhan Tahir Jun 15 '19 at 04:52
I edited the question with the segment tree code. Can you think of anything to reduce its complexity may be by using lazy propagation? – Farhan Tahir Jun 15 '19 at 06:44
@FarhanTahir I would add to the question description that you need to answer queries online. It seems like an important detail. – גלעד ברקן Jun 15 '19 at 14:14

score -1 · Answer 3 · answered Jun 15 '19 at 05:59

-1

You can use hash map. iterate from l to r and store each elements as key and occurrence as count.It will take O(n) to specify number of distinct element count in given range. You have to check for element already exists or not in the hash map every time you insert an element into the hash map. If element already exists then update the count else keep count as 1.

answered Jun 15 '19 at 05:59

NITESH KUMAR

22
7

What do you think I did in the python code I wrote first? – Farhan Tahir Jun 15 '19 at 06:48
You did the same as I said, sorry I didn't see it carefully. I will try to find a better approach.Thank you. – NITESH KUMAR Jun 15 '19 at 11:08

Number of occurrences of each distinct integer in given ranges for an array

3 Answers3