The problem I'm working on requires processing several queries on an array (the size of the array is less than 10k
, the largest element is certainly less than 10^9
).
A query consists of two integers, and one must find the total count of subarrays that have an equal count of these integers. There may be up to 5 * 10^5
queries.
For instance, given the array [1, 2, 1]
, and the query 1 2
we find that there are two subarrays with equal counts of 1
and 2
, namely [1, 2]
and [2, 1]
.
My initial approach was using dynamic programming in order to construct a map, such that memo[i][j] = the number of times the number i appears in the array, until index j
. I would use this in a similar way one would use prefix sums, but instead frequencies would accumulate.
Constructing this map took me O(n^2)
. For each query, I'd do an O(1)
processing for each interval and increment the answer. This leads to a complexity of O((q + 1)n * (n - 1) / 2))
[q
is the number of queries], which is to say O(n^2)
, but I also wanted to emphasize that daunting constant factor.
After some rearrangement, I'm trying to find out if there's a way to determine for every subarray the frequency count of each element. I strongly feel this problem is about segment trees and I've struggled with coming up with a proper model and this was the only thing I could think of.
However my approach doesn't seem to be too useful in this case, considering the complexity of combining nodes holding such a great amount of information, not to mention the memory overhead.
How can this be solved efficiently?