Most Element in Array Divide-And-Conquer O(N.log(N))

Question

An array a [], with N elements, admitting repeated, is said to "contain a v element mostly" if more than half of its content equals v. Given the array a [], it is intended to draw an efficient algorithm (at time N.log (N) and using divide-and-conquer) to check if it contains a majority element and to determine it. Consider the only available comparison operation between elements of the array, is the equality (a [i] == a [j]), performed in constant time. (Hint: In the algorithm, divide the array to [] into two subarrays a1 [] and a2 [], each one half the size of a []. If the element in most of a [] is v, then v must be also the element in majority of a1 [], or a2 [] or both).

int main() {

    int a[12] = {5, 9, 3, 13, 5, 21, 5, 7, 17, 12, 5, 6};
    int N = 12, lo = 0, hi = N - 1, mid,i,j;

    mid = lo + (hi - lo) / 2;
    int n1 = mid - lo + 1;
    int n2 =  hi - mid;
    int a1[n1],a2[n2];

    /* Copy data to temp arrays a1[] and a2[] */
    for (i = 0; i < n1; i++)
        a1[i] = a[lo + i];
    for (j = 0; j < n2; j++)
        a2[j] = a[mid+1+j];


    while (i < n1 && j < n2) {

        if(a1[i]==a2[j]){

        }else if(){


        }else{


        }

    }
    return 0;
}

Im having troubles on the way I should proceed using the operation of equality comparing the auxiliar arrays to see if the most element is on a1[] or a2[] or both!

@AlbinPaul It seems the OP is not allowed to sort. He cannot use other comparisons rather than equality. — kyriakosSt, Feb 02 '18 at 12:54
"If the element in most of a [] is v, then v must be also the element in majority of a1 [], or a2 [] or both" - The inverse deduction is not true however: Even if v is the majority in e.g. `a1[]` it doesn't have to be the majority in `a[]`. — JimmyB, Feb 02 '18 at 13:25

score 3 · Accepted Answer · answered Feb 02 '18 at 18:27

Here's a Python implementation that fits the description (sorry, I'm not versed in C but I think it's pretty straightforward code). We can follow the logged return values and indexes for each section that's examined to make sense of how it works.

# Returns v if v is a majority;
# otherwise, returns None
def f(arr, low, high):
  if low == high:
    return arr[low]

  if low + 1 == high:
    return arr[low] if arr[low] == arr[high] else None

  n = high - low + 1
  mid = (low + high) / 2

  l = f(arr, low, mid)
  r = f(arr, mid + 1, high)

  print 'n: ' + str(n) + '; l: ' + str(l) + '; r: ' + str(r) + '; L: ' + str((low, mid)) + '; R: ' + str((mid + 1, high))

  if l == r:
    return l

  counts = [0, 0]

  for i in xrange(low, high + 1):
    if arr[i] == l:
      counts[0] = counts[0] + 1
    if arr[i] == r:
      counts[1] = counts[1] + 1

  if l and counts[0] * 2 > n:
    return l

  if r and counts[1] * 2 > n:
    return r

  return None

Output:

a = [5, 9, 3, 5, 5, 21, 5, 7, 17, 5, 5, 5]

print f(a, 0, len(a) - 1)

"""
n: 3; l: None; r: 3; L: (0, 1); R: (2, 2)
n: 3; l: 5; r: 21; L: (3, 4); R: (5, 5)
n: 6; l: None; r: 5; L: (0, 2); R: (3, 5)
n: 3; l: None; r: 17; L: (6, 7); R: (8, 8)
n: 3; l: 5; r: 5; L: (9, 10); R: (11, 11)
n: 6; l: None; r: 5; L: (6, 8); R: (9, 11)
n: 12; l: None; r: 5; L: (0, 5); R: (6, 11)
5
"""

score 2 · Answer 2 · edited Feb 03 '18 at 12:23

2

I think the function should:

1) Recursively call itself for the first half of the array (returns answer a)

2) Recursively call itself for the second half of the array (returns answer b)

3) Loop through the array and count how many match a/b and return whichever has most matches

Note there is no need to actually copy the array at any stage because it is never modified, just pass in an index for the start and the length of the subarray.

edited Feb 03 '18 at 12:23

גלעד ברקן

23,602
3
25
61

answered Feb 02 '18 at 12:56

Peter de Rivaz

33,126
4
46
75

I think your description as is would return 1 for input `1, 1, 1, 2, 2, 3`. But 1 is not a majority. – גלעד ברקן Feb 02 '18 at 19:54
I'm downvoting until you fix this :) – גלעד ברקן Feb 03 '18 at 12:23

score 0 · Answer 3 · answered Feb 02 '18 at 22:08

It is probably not the answer you are looking for. But there is an interesting probabilistic approach to this problem. You can choose a certain position x of the array, and count the number of occurrences of array[x] to check if it has appeared >= array.size() / 2.

If there is an element that fills more than half of the array, then the chance of choosing it's position randomly is > 1/2 for each iteration.

So if you do something like 30 iterations, the chance of selecting the "dominating" element is (1 - (1/2)^30) which is okay for almost every application.

The complexity is O(numberOfIterations * arraySize)

Here is the code (:.

It is on C++, but i bet that you can translate it to C without much effort.

#include <vector>
#include <iostream>


int arraySize, numberOfIterations;

int count(int element, std::vector<int>& array)
{
    int count = 0;
    for(const int& number : array)
    {
        count += (number == element);
    }
    return count;
}


int main(){

    srand(time(0));

    std::cin >> arraySize;
    std::vector<int> arr(arraySize);

    for(int i = 0; i < arraySize; ++i)
    {
        std::cin >> arr[i];
    }

    std::cin >> numberOfIterations;

    for(int i = 0; i < numberOfIterations; ++i)
    {
        int idx = rand() % arraySize;
        int freq = count(arr[idx], arr);
        //std::cout << idx << std::endl;
        if(freq > arraySize / 2)
        {
            std::cout << "The element = " << arr[idx] << " dominates the array provided " << std::endl;
            return 0;
        }
    }
    return 0;
}

Most Element in Array Divide-And-Conquer O(N.log(N))

3 Answers3