0

Can we find the mode of an array in O(n) time without using Additional O(n) space, nor Hash. Moreover the data is not sorted?

amit
  • 175,853
  • 27
  • 231
  • 333
yuvi
  • 1,019
  • 2
  • 14
  • 21
  • http://stackoverflow.com/questions/9316352/how-do-i-find-the-mode-of-an-array ? – ShinTakezou Dec 02 '12 at 19:33
  • 1
    @DarthVader - It's [standard terminology](http://en.wikipedia.org/wiki/Mode_%28statistics%29). It means the number that occurs most often. There can be more than one. – Ted Hopp Dec 02 '12 at 19:33
  • @ShinTakezou : Yep. It is for sorted array!! – yuvi Dec 02 '12 at 19:36
  • 1
    @ShinTakezou The linked question is about sorted arrays. – Omri Barel Dec 02 '12 at 19:36
  • let the sort be the first step of the algorithm (in place sort of course) – ShinTakezou Dec 02 '12 at 19:37
  • "without using additional O(n) *space*". – ShinTakezou Dec 02 '12 at 19:38
  • The question is "find mode of an array without hashmap in un sorted array in O(n)". I'm guessing it's time complexity. – Omri Barel Dec 02 '12 at 19:38
  • 1
    yep.. i m sorry.. i meant O(n) time. – yuvi Dec 02 '12 at 19:39
  • that's the title, read the question: if we can find the mode of an array without using Additional O(n) space – ShinTakezou Dec 02 '12 at 19:39
  • 2
    If the array is unsorted and you cannot use hashing of any kind, I think this is impossible. See [this thread](http://stackoverflow.com/questions/4168622/computing-the-mode-most-frequent-element-of-a-set-in-linear-time), for example. – Ted Hopp Dec 02 '12 at 19:40
  • Cool. i wanted to know if a similar technique like Boyer Algorithm described on http://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html – yuvi Dec 02 '12 at 19:43
  • Proving that an algorithm with a given set of constraints doesn't exist is notoriously hard but in this case I feel it should be possible. – biziclop Dec 02 '12 at 19:44

3 Answers3

1

The problem is not easier then Element distinctness problem1 - so basically without the additional space - the problem's complexity is Theta(nlogn) at best (and since it can be done in Theta(nlogn) - it is ineed the case).

So basically - if you cannot use extra space for the hash table, best is sort and iterate, which is Theta(nlogn).


(1) Given an algorithm A that runs in O(f(n)) for this problem, it is easy to see that one can run A and then verify that the resulting element repeats more then once with an extra iteration to solve the element distinctness problem in O(f(n) + n).

amit
  • 175,853
  • 27
  • 231
  • 333
1

Under the right circumstances, yes. Just for example, if your data is amenable to a radix sort, then you can sort with only constant extra space in linear time, followed by a linear scan through the sorted data to find the mode.

If your data requires comparison-based sorting, then I'm pretty sure O(N log N) is about as well as you can do in the general case.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

Just count the frequencies. This is not O(n) space, it is O(k), with k being the number of distinct values in the range. This is actually constant space.

Time is clearly linear O(n)

//init
counts = array[k]
for i = 0 to k
    counts[i] = 0

maxCnt = 0
maxVal = vals[0]
for val in vals
    counts[val]++
    if (counts[val] > maxCnt)
        maxCnt = counts[val]
        maxVal = val

The main problem here, is that while k may be a constant, it may also be very very huge. However, k could also be small. Regardless, this does properly answer your question, even if it isn't practical.

goat
  • 31,486
  • 7
  • 73
  • 96
  • I dont understand what you mean by "this is actually constant space". How? IMO it's still linear only on different variable. and we can say that k <= n – Honza Brabec Dec 02 '12 at 21:08
  • Because k doesn't change depending on the size of the input n. And no, `k <= n` is not true. For example, if were talking about an array of 32 bit ints, then k = 2^32 regardless of n. n could be 5 and k is still enormous. And, k remains the same if n is 1 trillion. – goat Dec 02 '12 at 21:17
  • Oh, now I see how you meant it. Basically the same approach as in counting sort. I thought that you'd create new entries in the memory only when you find a new unique item but that would require a hashmap to run in linear time and that is forbidden so my mistake :) Now there is a problem however if the entries are for example strings of arbitrary length. – Honza Brabec Dec 02 '12 at 21:58