7

It's easy to find the most frequently occurring element in O(n). Is there a faster algorithm (O(logn)) to do this? (given the array is sorted)

One Two Three
  • 22,327
  • 24
  • 73
  • 114

3 Answers3

11

It is impossible. Below is not a strict proof (strict lower-bound proofs are hard in general) but a sane reasoning.

Assume your array always looks like 1, 2, 3, 4, 5, 6, ..., n. Then you replace some number with occurrence of a previous number: 1, 2, 3, 3, 5, ... n. Now in the new array a[i] = i for all i except for one position.

In order to find the most frequent element you must examine all positions and check for irregularity there. Note that there is exactly one irregularity, and you can say nothing about its position if you look at any other element of the array. Thus this problem is not easier than finding a one in a boolean array of ones and zeroes, which requires linear time.

Ivan Smirnov
  • 4,365
  • 19
  • 30
  • The guy (interviewer) told me that there's an o(logn) by doing something like skipping by a delta d, instead of 1 . For eg. you see at num[i] a value x, you skip to num[i+d], if it's still x, continue, otherwise, back up ... something like that. I'm not saying he's 100% right, (he oculd be wrong), but it's fair to assume he knew what he was doing – One Two Three Oct 23 '17 at 17:59
  • 1
    @OneTwoThree Interviewers also make mistakes sometimes. Or maybe you misunderstood the problem? Maybe it was asked to find the element which occur at least half of the times (though in this case the problem is trivial). I'm pretty sure that my example shows that there may exist no O(log n) time algorithm for the problem you described. – Ivan Smirnov Oct 23 '17 at 18:02
  • Ooh, come to think of it. You may be right. Perhaps it was "find the majority... (more than 50%)" ... Assuming that was the question, how'd you solve it? – One Two Three Oct 23 '17 at 18:14
  • 1
    @OneTwoThree If we need this kind of majority, look at the element at the position n/2 (plus-minus 1 depending on the parity of size). If you need to check that the element is indeed the majority, do it with binary search. – Ivan Smirnov Oct 23 '17 at 18:16
  • I'm just trying to understand your informal proof here. You said "replace some number with occurence of a prev number". You then replace 4 (the 3th element, 0-based), with "the occurrence of a prev" number, which should be 1, no? I'm not following why you replace "4" with "3" – One Two Three Oct 23 '17 at 18:19
  • @OneTwoThree Let's play a game. I give you array of a kind `1 2 3 ... n`, but in some place there is `i i i+2` instead of `i i+1 i+2`. You have to solve the problem from the post, that is, find this `i` as it is the most frequent value. Then I prove that it is impossible. – Ivan Smirnov Oct 23 '17 at 18:32
  • Ok, got it. Thanks. – One Two Three Oct 23 '17 at 18:37
3

Not O(logn) but if the number of distinct integers is less, you can solve it in O(mlogn) where m is the total number of distinct integers.

It must be noted that this solution will only be fruitful if m << n.

The idea is to start from index 0 and find the last occurrence of the same number in the sorted array, which can be done using binary search by skipping with delta d, as your interviewer said and increasing this delta every time, until you find the last number.

On finding that, you can have another variable maxCount which was initialized to 0 in the starting. Check if endIndex - startIndex > maxCount and if yes, replace maxCount with endIndex - startIndex. Now, repeat the same process starting from startIndex+1.

As @ivan has mentioned above, this will fail terribly and would give a O(n) solution if all the numbers are distinct.

Parijat Purohit
  • 921
  • 6
  • 16
3

This Python code makes it in O(mlogn) time based on @Parijat's answer.

import bisect

def most_frequent_in_sorted(lst):
    most_frequent = None
    max_frequency = 0
    n = len(lst)
    idx = 0

    while idx < n:
        # Get leftmost index holding an element != lst[idx]
        next_leftmost_idx = bisect.bisect_right(lst, lst[idx])

        # Update most frequent element
        cur_frequency = next_leftmost_idx - idx
        if cur_frequency > max_frequency:
            most_frequent = lst[idx]
            max_frequency = cur_frequency

        # Update index to hold next different integer
        idx = next_leftmost_idx

    return most_frequent
MROB
  • 631
  • 1
  • 8
  • 14