9

So I found this Google interview algorithm question online. It's really interesting and I still have not come up with a good solution yet. Please have a look, and give me a hint/solution, it would be great if you can write the code in Java :).

"Design an algorithm that, given a list of n elements in an array, finds all the elements that appear more than n/3 times in the list. The algorithm should run in linear time. (n >=0 ) You are expected to use comparisons and achieve linear time. No hashing/excessive space/ and don't use standard linear time deterministic selection algo"

Newbie_code
  • 163
  • 1
  • 9
  • `and don't use standard linear time deterministic selection algo` say what??? – Amir Raminfar Dec 02 '11 at 20:20
  • I am curious to know how one would do this without hashing. Although does an `int[]` count as hashing. It def counts as excessive space. – Amir Raminfar Dec 02 '11 at 20:21
  • I can't think of an exact solution off the bat, but I do believe there is a more well known problem that finds all elements that appear more than n/2 times by iterating through the array and using a trick to find the most popular element then looking through the array again to check if it appears enough times. If you repeat that process and ignore the most popular element, it should solve this problem as there are at most 2 elements that appear more than n/3 times – pasha Dec 02 '11 at 20:25
  • Formulated for three elements occuring more than n/4 times, but straightforward to modify: [algorithm description](http://stackoverflow.com/a/8206433/1011995) – Daniel Fischer Dec 02 '11 at 20:55

2 Answers2

8

My solution was inspired by the Tetris game. Solution highlight (dubbed as 'Tetrise process'): use three key-value pairs for bookkeeping, with key the element, value the count of the element. In a main loop, we keep at most 3 latest distinct elements. When the count of all three keys are non-zero, we decrement the counts of all and eliminate zero-count key(s), if any. At the end, there may or may not be some residual elements. These are the survivors of the Tetris process. Note that there can be no more than 3 residual elements. If nothing left, we return null. Otherwise we loop through the original n elements, counting the number of these residual elements and return those whose count is > n/3.

Hint of proof: To show the correctness of the above algorithm, note that for an element must survive the Tetris process or remain in the residue to satisfy the requirement. To see why, let's denote the number of removal operations as m and the total count of residual elements r. Then we have n = 3 * m + r. From here we get m <= n/3, because r >= 0. If an element didn't survive the Tetrise process, the maximum occurrence it can appear is m <= n/3.

Time complexity O(n), space complexity O(1).

Chivalryman
  • 89
  • 1
  • 3
  • 1
    The OP asks to report _all_ elements that occur more than n/3 times in the list. Note that the Tetris algorithm does **not** ensure that **all** residual elements occur more than n/3 times in the list (try it out on the string "AADBBBBDABCC"; residual elements are B and C, but B is the only desired element). So you'll have to go through the list again after the Tetris process, counting the occurrences of each residual element (there can be a maximum of 2 residual elements), and then check if that frequency exceeds n/3. Thankfully the time and space complexities remain unchanged. – Vicky Chijwani Nov 03 '12 at 23:00
  • Aren't your key value pairs technically hashing? The question states no hashing allowed. Otherwise, things would be a bit easier. – Henley Jul 31 '13 at 20:14
  • Still, this is a very creative solution that deserves some credit, not nit picking :) – Henley Jul 31 '13 at 20:31
  • @HenleyChiu I don't see how this approach is equivalent to hashing. Can you explain more? – Phani Jan 28 '17 at 22:36
7

Hint: Look at Boyer and Moore's Linear Time Voting Algorithm

Better Hint: Think about solving the majority problem first. That is, try to find an element that occurs at least n/2 times. The basic idea of the algorithm is if we cancel out each occurrence of an element e with all the other elements that are different from e then e will exist until the end if it is a majority element.

findCandidate(a[], size)
    //Initialize index and count of majority element
    maj_index = 0;
    count = 1;

    for i = 1 to n–1 {
      if a[maj_index] == a[i]
          count++;
      else
          count--;

      if count == 0 {
          maj_index = i;
          count = 1;
      }
    }
    return a[maj_index]

This algorithm loops through each element and maintains a count of a[maj_index]. If the next element is same then increments the count, if next element is not same then decrements the count, and if the count reaches 0 then changes the maj_index to the current element and sets count to 1.

Next you need to check that this element indeed occurs at least n/2 times, but that can be done in one pass.

PengOne
  • 48,188
  • 17
  • 130
  • 149
  • Good hint. Not sure exactly how to apply to n/3 yet, but definitely great to know algorithm. Thanks! Will use this to try to come up with solution to n/3. But why did this question get closed? I think this is an excellent question to learn about.... – Newbie_code Dec 02 '11 at 20:46
  • @Newbie_code I think it was the way you phrased the question. Generally, on SO, you should show some effort in solving the question yourself rather than ask the community to write code for you. I voted to re-open it. – PengOne Dec 02 '11 at 20:47
  • @Newbie_code To get the `n/3` case, just think a bit about how this algorithm works. It's not too hard to generalize it, but I will warn you that the difficult part IMO is that 2 different elements could work. – PengOne Dec 02 '11 at 20:48
  • 1
    I posted this question here just because I think it's interesting, and hard, and I'd like share it to everyone else who would like to discuss and know how to solve it. It could be easily solved with hashing but w/o it and in linear time? Hmm... I'm sure whoever read your answer will learn something if they did not know this before. – Newbie_code Dec 02 '11 at 20:53
  • 4
    I'm quite familiar with the Majority Voting algorithm, but I'm not sure I see how to adapt it here. The correctness proof hinges on a lemma that says that you can rearrange the elements of the array such that you can pair up the majority element in a way that cancels out everything else. This lemma fails if you're looking for an element that appears a third of the time. Can you be a bit more specific about how you're generalizing the algorithm? – templatetypedef Dec 03 '11 at 19:49
  • @templatetypedef You just have to count the non `e` elements as 0.5 and count `e` as 1.0 to make it work out, though this will only catch one of the two elements. I'll add the details if this gets re-opened. – PengOne Dec 04 '11 at 01:27
  • What about the sequence BAACAABAACAA...? Doesn't A never become the match element? Am I misunderstanding the proposal? – jonderry Dec 06 '11 at 00:47
  • One more hint: there are at most 2 such elements. – Jack Feb 08 '12 at 17:40
  • I agree with @templatetypedef. I think Chivalryman's Tetris approach is a _much_ better way to solve this problem, than an adaptation of Majority Voting. – Vicky Chijwani Nov 03 '12 at 23:02
  • For the sequence "ABADC", A is the required element, but the Majority Voting Algorithm wouldn't yield this. I agree with @templatetypedef that Boyer-Moore Voting Algorithm cannot be generalized to solve this problem. – rajatkhanduja Nov 04 '12 at 14:56