1

Let A = {4, 1, 3, 2}

There are 10 contiguous sub-arrays. Let me consider the maximum of all those sub-arrays.

  Sub array   Max 

{4}            4
{1}            1
{3}            3
{2}            2
{4, 1}         4
{1, 3}         3
{3, 2}         3
{4, 1, 3}      4
{1, 3, 2}      3
{4, 1, 3, 2}   4

So:

Max #    Occurrences
 1           1
 2           1
 3           4
 4           4

Question: Is there any method for counting the number of such occurrences in O(n) time? Using dequeue I can find the maximum for all the intervals, but it will take O(n^2 log n) time, which is not feasible for large size array.

Henrik
  • 23,186
  • 6
  • 42
  • 92
user2784016
  • 179
  • 1
  • 7

1 Answers1

3

The algorithm would be to first find the global maximum(s), these are maximum on all the intervals they're in (counting them should be O(N), multiple global maximas will complicate the complexity a bit though). Then you know that the other maximas will not be maximum for an intervall that extends beyond or including the global maximum, so you can then split down the problem into subarrays.

For example with your array you would get first to observe that 4 is the maximum of the array, it is consequently the maximum of all sub-arrays starting at or before 0 and ending at or after 0, that is 4. Then you get the preceding and succeding subarrays {} and {1, 3, 2}. Then you go on by checking {1, 3, 2} (we skip the empty one as it is trivial) there 3 is the maximum and it's the maximum of all subarrays where it's in, namely those which start at or before 1 and end at or after 1, there is four such (2*2), then it continues with the subarrays {1} and {2}.

The complication where we run into multiple maximas have to be considered also. For example if we're at the array {1, 4, 3, 4, 3, 2} you'll find two maximas (two fours), the same applies here the first 4 is maximum for all intervals where it belongs (which here is 10) and the second is maximum in 12, but beware that some of these intervals are the same, you have to decide how to count them (see below). Now the same applies that the other numbers can only be maxima in a subarray where 4 is not in, so we end up with 3 subarrays to consider, namely {1}, {3} and {3, 2}.

When you have multiple maximas in an interval it will not be as non-ambiguos how you want to count them. For example in the interval {4, 3, 4}, the 4 is maximum in the intervals {4} (the first), {4} (the second), {4, 3}, {3, 4} and {4, 3, 4}. All but the latest is quite straight forward, there's one, but how do you want to count the last? Well it's one interval so it's one, but you could also decide to count it twice. When analyzing the array according to the algorithm you will end up in having the first one being maximum in 3 intervals, and the second in 3 intervals which adds up to 6, which corresponds to the case where you count the last interval twice (because it has two 4s).

If you on the other hand want to count the interval only once you could for example use subarrays for counting them. You first construct the maximal subinterval containing each of them alone and the one that contains both. For example in the array {1, 4, 3, 4, 3, 2} you have first the subarrays starting before or at the first 4 and ends at or after the first 4, but before the second (this means you get 4 interval), then you do the corresponding for the second which means intervals starting at or before the second 4 but after the first and ends at or after the second 4 (this means you get 6 intervals), at last you count them containing both that is the interval that starts at or before the first 4 and ends at or after the second (this gives 6 intervals). When we sum that up we end up with 16 intervals.

Another solution could be to pick one of the 4s to be considered the maximum and then handle the remaining 4s in respective subarray. In this example one would maybe choose the second because it's the most middle 4. That would result in counting 12 intervals and then when you handle the left subinterval ({1, 4, 3}) you will there find that that 4 is the maximum in 4 intervals. It will also add up to 16 intervals where 4 is the maximum.

The algorithm does this in O(N log N) time on average and in O(N^2) time in worst case scenario if you're using naive approach to finding the maximum in the subarrays (which is O(N)). However using segment trees you should be able to reduce the maxima finding in the subarray to a O(log N) operation after the (one time) construction of the segment tree (which is O(N)). After the construction of the segment tree the worst case would be O(N log N) and on average O((log N)^2). In the worst case scenario the counting dominates the complexity so it's O(N log N) and on average the building of the segment tree dominates so it's O(N).

skyking
  • 13,817
  • 1
  • 35
  • 57
  • Can u please elaborate your algorithm using the example A = {4, 1, 3, 2, 5, 4, 1} – user2784016 Aug 10 '15 at 06:43
  • Thanks a lot. Now I shall try to implement it. – user2784016 Aug 10 '15 at 06:49
  • 1
    Correct that the first 4 has 10 subintervals and the second has 12. But be careful that 10 + 12 = 21 here. – Stef Aug 10 '15 at 07:28
  • @Stef I think it is even less than `21`, I get `15` I think for this example. You need to remove `1` for all subarrays containing both `4`: `{4, 3, 4}`, `{1, 4, 3, 4}`, `{1, 4, 3, 4, 3}`, `{1, 4, 3, 4, 3, 2}`, `{4, 3, 4, 3}`, `{4, 3, 4, 3, 2}`. – Holt Aug 10 '15 at 07:34
  • BTW, the average complexity is probably `O(N log N)`, but you'll often get close to `N^2` (sorted array for example). – Holt Aug 10 '15 at 07:39
  • @Stef That's a good observation, I've updated the answer to take that into account. @Holt Have you counted correctly? I get it to be `16`. – skyking Aug 10 '15 at 07:53
  • @Holt Yes, you could compare the situation with how quick sort behaves. It relies on that the pivot element is often near the median to achieve `O(log N)` complexity. The average complexity relies on that most of the sequences are unsorted in a way that makes it have `O(log N)` complexity. If you have a scenario where you're likely to have a mostly sorted array you'll have to adopt the algorithm to handle that better (you can't randomize the input as you could for quick sort though). – skyking Aug 10 '15 at 07:57
  • @skyking It's probably `16` since I put only 6 subarrays in my comments, I probably counted one twice by mistake ;) – Holt Aug 10 '15 at 07:57
  • @Holt: It's possible to find the maximum in a given interval in `O(log n)` worst case time (using `O(n)` extra space and precalculation time to set up a binary search tree), so it should be possible to get even the worst case complexity down to `O(n log n)`. – Ilmari Karonen Aug 10 '15 at 08:07
  • @IlmariKaronen How would you find the maximum in `O(log n)` without having to construct a different structure at each step (which would take `O(n)`)? – Holt Aug 10 '15 at 08:10
  • The algorithm to find the maximum would still need to examine all the elements in the array. I'd interpret that comment as it is possible to find the maximum in a subinterval in O(log N) time given that you have done a O(N) operation on the whole (which need to be done only once). A such algorithm could reduce the worst case complexity. – skyking Aug 10 '15 at 08:12
  • 1
    @Holt: There's a data structure for that: I've heard it called a [segment tree](http://wcipeg.com/wiki/Segment_tree) (although, confusingly, it's not quite the same thing as [what Wikipedia calls by that name](https://en.wikipedia.org/wiki/Segment_tree)). You'll need to build it once for the full array, which takes `O(n)` time and space, but each query for the maximum in a given interval then takes `O(n log n)` time in the worst case. – Ilmari Karonen Aug 10 '15 at 08:16
  • @IlmariKaronen Thanks for pointing that out, by using segment trees you should be able to solve this problem in `O(N)` on average an `O(N log N)` in worst case. – skyking Aug 10 '15 at 13:50
  • @IlmariKaronen I implemented an O(n log n) answer to this question using segment trees here: http://stackoverflow.com/a/31945031/1327235 – Juan Lopes Aug 11 '15 at 16:16