2

Input is an array that has at most one element that appears at least 60% a time. The goal is to find if this array has such an element and if yes, find that element. I came up with a divide and conquer function that finds such an element.

from collections import Counter

def CommonElement(a):
    c = Counter(a) 
    return c.most_common(1) #Returns the element and it's frequency

def func(array):
    if len(array) == 1:
        return array[0]

    mid = len(array)//2

    left_element = func(array[:mid])
    right_element = func(array[mid:])

    if left_element == right_element:
        return right_element

    
    most_common_element = CommonElement(array)

    element_count = most_common_element[0][1] #Getting the frequency of the element
    percent = element_count/len(array)
    if percent >= .6:
        return most_common_element[0][0] #Returning the value of the element
    else:
        return None

array = [10,9,10,10,5,10,10,10,12,42,10,10,44,10,23,10] #Correctly Returns 10
array = [10,9,10,8,5,10,10,10,12,42,10,12,44,10,23,5] #Correctly Returns None

result = func(array)
print(result)

This function works but it's in O(n log(n)). I want to implement an algorithm that's O(n)

The recursion function for my algorithm is T(n) = 2T(n/2) + O(n). I think the goal is to eliminate the need to find frequency, which takes O(n). Any thoughts?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
CrazyCSGuy
  • 123
  • 1
  • 15
  • I would create a histogram. Create a dictionary where the key is your number, and the value is the number of entries. Then you can scan that dictionary to see if any item has more than 60% of the entries. – Tim Roberts Mar 28 '21 at 03:55
  • 1
    Partitioning/selection is O(n). Something like introselect to compute the median is guaranteed to yield the right answer because in the sorted array, the correct number takes up 60% of the span – Mad Physicist Mar 28 '21 at 04:47
  • "Input is an array that has at most one element that appears at least 60% a time." - well, it's not like there's room for *two* elements to appear that often. – user2357112 Mar 28 '21 at 05:24
  • But there is room for 0 elements to appear at least 60%. What I meant to say is the array COULD have such an element. But it's possible that such element doesn't exist. – CrazyCSGuy Mar 28 '21 at 05:29
  • It looks like the whole divide and conquer part of your algorithm isn't doing anything for you - you could remove it entirely, and you'd get ggorlen's answer. – user2357112 Mar 28 '21 at 05:40

3 Answers3

2

You can create a frequency counter for all elements in the list one time in O(n). Then, iterate the lookup table and see if any are at least 60% of the elements (in other words, count / len(lst) >= 0.6).

>>> from collections import Counter
>>> L = [4, 2, 3, 2, 4, 4, 4]
>>> Counter(L)
Counter({4: 4, 2: 1, 3: 1})
>>> Counter(L).most_common(1)
[(4, 4)]
>>> item, count = Counter(L).most_common(1)[0]
>>> count / len(L)
0.6666666666666666
>>> count / len(L) >= 0.6
True

Divide & conquer is a creative, but inappropriate, approach for this problem.

...or so I thought, but see this answer.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • In fact, if there are, it would be the one with the highest count, so you would need just to check if it is over 60%. – jurez Mar 28 '21 at 04:01
  • Actually, this is a school assignment that's part of Divide and Conquer lesson. So while I would like to depend on Counter entirely, I need to implement my own Divide and Conquer algorithm. – CrazyCSGuy Mar 28 '21 at 04:08
  • Interesting. I don't see a way to do better than O(n). O(log(n)) is impossible. Is your school assignment/professor telling you it's possible to solve this with D&C in O(n)? – ggorlen Mar 28 '21 at 04:10
  • I'm not looking it solve in O(log n). I found a solution in O(n log n) but can't think of a way to do it in O(n) so wanted to ask the "internet". I'll reach out to my professor if it's even possible but I think it should be because the question is "implement in O(n)", not "Can you implement in O(n). – CrazyCSGuy Mar 28 '21 at 04:24
  • You can do it with introselect. The median is guaranteed to be the right answer. – Mad Physicist Mar 28 '21 at 04:47
  • @MadPhysicist nice, that makes sense. Maybe add an answer? – ggorlen Mar 28 '21 at 04:50
  • 1
    Just did. I'm on mobile, so no sample implementation. You can figure it out if you know how to write quicksort, or look at Wikipedia or numpy source. – Mad Physicist Mar 28 '21 at 05:11
2

If you are guaranteed to have a list 60% of which is a given number, that number is guaranteed to be the median. To see this intuitively, sort the list. The number in question represents a contiguous window that is 60% of the length of the list. There is no way to place that window so that it doesn't cover the middle.

There are plenty of divide-and-conquer algorithms for finding the median. A common one is called introselect. You can find an implementation in numpy's partition and argpartition functions (it's written in C). The basic idea is to do quicksort, but only recurse into the portion that contains the index you care about. This reduces the algorithm to O(n).

By the way, you could search for any index between 40% and 60% of the length of the list. 50% seems like a reasonable middle ground.

To verify that the median appears > 60% of the time, run a single loop over the array, counting the number of times the median appears.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Thanks for the info. I didn't even think about the element being median. I need to find the median of the array in O(n) and verify that it occurs at least 60% in O(n). – CrazyCSGuy Mar 28 '21 at 05:32
  • @CrazyCSGuy. Verifying is easy. Just make a single pass, count the number of occurrences. – Mad Physicist Mar 28 '21 at 05:33
-1

There's a pretty simple algorithm for finding the majority element of a collection, if the collection has one:

def majority(l):
    count, candidate = 0, None
    for element in l:
        if count == 0:
            count, candidate = 1, element
        elif element == candidate:
            count += 1
        else:
            count -= 1
    return candidate

This algorithm essentially pairs each element of the input against another element with a different value until all unpaired elements have the same value, then returns that value. If the input has a majority element, the algorithm must return that.

You can compute a candidate with this algorithm, then make another pass through the input and see if that candidate is a 60% supermajority. This works in O(1) space and O(n) time without mutating the input, while hash-based or introselect-based algorithms would need more space or mutate the input. It's also immune to hash collision attacks (unlike Counter and other hash-based approaches) and doesn't require elements to have an order relation (unlike introselect).

user2357112
  • 260,549
  • 28
  • 431
  • 505