15

Interview Question:

Edited Below You are given an array. You make 2 heaps out of it, one minheap and the other max heap. Now find the median of the array using these 2 provided heaps in O(nlog n) time.

Corrected Question Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?

Solution This problem can be solved using 2 heaps and the median can always be accessed in O(1) time.

herohuyongtao
  • 49,413
  • 29
  • 133
  • 174
EFreak
  • 3,119
  • 3
  • 27
  • 31
  • 3
    I am guessing the actual question is to be able to determine median quickly even after many inserts and somehow got lost in translation. Frankly, this looks like a homework question disguised as an interview question. –  Feb 02 '11 at 18:28
  • 1
    @Moron: Disagree with homework assessment. It's too easy to just copy the homework statement and not introduce "lost in translation" issues. This smells more like an interview question that got lost in translation. – jason Feb 02 '11 at 20:14
  • @Jason: I am not saying EFreak has done this (or hasn't). It could be that the interviewer might have gone through a text book... –  Feb 02 '11 at 20:16
  • If you are allowed to take O(N log N) time, why not just sort the array instead? – finnw Feb 03 '11 at 19:27
  • Guys! I think I messed up with the question :P n @Moron: I don't go to school :) so no question of homework. I'm posting the corrected question now. – EFreak Mar 03 '11 at 18:46
  • @EFreak: So it was as I guessed! The interviewer seems to be going through homework problems :-) –  Mar 03 '11 at 18:58
  • @Efreak: http://stackoverflow.com/questions/2213707/finding-an-appropriate-data-structure –  Mar 03 '11 at 20:01
  • @Moron: That was indeed a nice find. Thanks! – EFreak Mar 05 '11 at 12:29

4 Answers4

13

Here's how you use both heaps. Note that I'm assuming you don't know the number of elements, and this is why we must pop until we pop something from the min heap that is larger than or equal to what we pop from the max heap. Note that we return the average because in the case of a set like {1, 2, 3, 4} the median is actually 2.5 (the average of the two "middle" values). I'm assuming double as the value type, but this can obviously be anything. Here:

double min = minheap.pop();
double max = maxheap.pop();
while(min < max) {
    min = minheap.pop();
    max = maxheap.pop();
}

return (min + max) / 2;

Since popping is O(log n) and we have to pop O(n / 2) values, this is O(n log n).

jason
  • 236,483
  • 35
  • 423
  • 525
  • 1
    @marcog: Did you even bother to think about it? Consider, e.g., `{1, 2, 3}`. Pop gives `min` as `1` and `max` as `3`. Pop again gives `min` as `2` and `max` as `2`. Condition `min < max` fails so we return `(min + max) / 2 = 2`. I challenge you to provide an example where this algorithm fails. – jason Feb 02 '11 at 17:43
  • @Jason Consider 1, 2, 2. Pop gives min 1 max 2. Pop again gives min 2 max 1. Then you return 1.5 when the answer should be 2. (EDIT: I was wrong, the second pop gives min 2 max 2.) – moinudin Feb 02 '11 at 17:45
  • @Jason I fail, sorry I just realised my mistake. Inverting vote. Apologies. – moinudin Feb 02 '11 at 17:48
  • @marcog: No, you don't understand the algorithm. Pop gives `min` as `1` and `max` as `2`. Condition `min < max` is okay, so proceed to next iteration. Pop gives `min` as `2` and `max` as `2` (yes, again, that's how a heap works). Condition `min < max` fails so return `(min + max) / 2 = 2`. – jason Feb 02 '11 at 17:49
  • 1
    Yeah, so this is appropriate when you don't know N, but if you're given an array to begin with then obviously you do. Even then, two heaps is still overkill - you can find N by finding the depth of the tree O(log N) or, if the heap is stored as an array, N is the length of the array. I know this isn't answering the original question, so I'm guess I'm curious as to whether there's a case where using two heaps is the only, or the optimal, solution? – Stewart Murrie Feb 02 '11 at 18:02
  • @britishmutt: I think it's reasonable to assume that you don't know the number of elements otherwise there is no point to using two heaps (you can do it with one if you do know the number of elements). I suspect that something was lost in the translation from the interviewer to the interviewee to being posted on SO. – jason Feb 02 '11 at 18:21
  • @Jason I get that. I still don't see the need for two heaps :) – Stewart Murrie Feb 02 '11 at 18:26
  • Alright, my final thought: maybe if all you are given is a well-encapsulated `Heap` object and the only method you have access to is `getMin()`, so there's no way to inspect the underlying data structure to find N. Then, I guess, two heaps is the only way. I'm going to go look at other questions now :) – Stewart Murrie Feb 02 '11 at 18:32
  • @Jason, if I'm correct your solution shows O(n log n) time complexity for a single median lookup, which btw is what you have by sorting the array and taking the middle element. The question asks for **keeping** the median of an **expanding** array, and I'm not sure I understand how your solution works there (or if it aimed to). Thx. – Savino Sguera Jan 25 '12 at 09:06
  • @britishmutt two heaps IMHO are needed if you have a stream of int in input and you want to keep the median updated. This will take O(n log n) time and O(n) space. – Savino Sguera Jan 25 '12 at 09:09
  • @Savino Sguera: Note that the question was edited over a month after my answer was provided. – jason Jan 25 '12 at 12:00
6

A working implementation in java, using 2 heaps, O(n log n). At any time I keep the two heaps balanced in size (ie. they differ at most by 1, if we entered n elements such that n%2==1). Getting the median is O(1). Adding a new element is O(log n).

public class MedianOfStream {

    private int count;
    private PriorityQueue<Integer> highs, lows;

    public MedianOfStream() {
        highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg0.compareTo(arg1);
            }
        });
        lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg1.compareTo(arg0);
            }
        });
    }

    private int getMedian() {
        if (count == 0)
            return 0;
        if (lows.size() == highs.size()) {
            return (lows.peek() + highs.peek()) / 2;
        } else if (lows.size() < highs.size()) {
            return highs.peek();
        }
        return lows.peek();
    }

    private void swap(){
        int h = highs.poll();
        int l = lows.poll();
        highs.add(l);
        lows.add(h);
    }

    public int updateMedian(int n) {
        count++;

        if (count == 1)
            lows.add(n);

        else if (count==2) {
            highs.add(n);
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        else {
            if (n > highs.peek()) {
                lows.add(highs.poll()); // O(log n)
                highs.add(n); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
                lows.add(n); // O(log n)
            }
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        // if we added an even # of items,
        // the heaps must be exactly the same size,
        // otherwise we tolerate a 1-off difference

        if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
            if (lows.size() < highs.size()) {
                lows.add(highs.poll()); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
            }
        }

        return getMedian(); // O(1)
    }
}
Savino Sguera
  • 3,522
  • 21
  • 20
3

Popping from a heap is an O(log N) operation, so you can achieve O(N log N) by popping half the elements from one of the heaps and taking the last popped value (you'd have to handle edge cases). This doesn't take advantage of the other heap though.

You can achieve O(N) using the selection algorithm, but the constant factor is very high. The former suggestion is probably better if you already have a heap.

moinudin
  • 134,091
  • 45
  • 190
  • 216
  • yeah .. the question was O(nlog n) itself.. posted it incorrectly. – EFreak Feb 02 '11 at 17:29
  • @EFreak Thought so, answer updated. @Downvoter Please explain – moinudin Feb 02 '11 at 17:33
  • The only reason I can think to use two heaps is to cater for the edge case of N being even, where the median is then defined as the average of the two middle values which you'd get if you removed N/2 from both heaps. Seems like overkill for that purpose though. – Stewart Murrie Feb 02 '11 at 17:36
  • @britishmutt Exactly, that just doubles the runtime without really making the code any simpler. – moinudin Feb 02 '11 at 17:37
  • Solving the problem in O(nlog n) isn't difficult. A simple quicksort and a fetch solves it. Problem is how do I use the 2 heaps to solve it? – EFreak Feb 02 '11 at 17:39
  • @EFreak: "simple" quicksort is O(*n* ²) worst case. Use mergesort. – Fred Foo Feb 02 '11 at 17:42
  • @EFreak Why would you want to use two when one does the job? – moinudin Feb 02 '11 at 17:42
  • @EFreak: Please see my solution: http://stackoverflow.com/questions/4877604/find-the-median-of-an-array-using-a-min-heap-and-a-max-heap/4877804#4877804. – jason Feb 02 '11 at 17:44
  • @marcog: Because typically in interview questions like this the point is to use what you are given. – jason Feb 02 '11 at 17:44
  • @Jason No, you give the best answer. If your interviewer gives you a list of random numbers are you going to use it because he gave it to you? – moinudin Feb 02 '11 at 17:47
  • 1
    @marcog: I'm going to do what the interviewer tells me. Here, according to @EFreak, __find the median of the array using these 2 provided heaps in O(nlog n) time__. There are many points to an interview question. One point is often to test your specific knowledge, often as a way of seeing how much knowledge you do possess. So here, for example, let's see if the interviewee knows about heaps. That will give me some insight into whether or not they know basic data structures and algorithms. A second point is to see if you're a team player, or can follow directions, etc. – jason Feb 02 '11 at 17:51
  • @Jason And I did it using 2 heaps. I just ignored the second. If the interviewer is persistent about getting the worse answer, I'll consider that a sign to turn down any offer. – moinudin Feb 02 '11 at 17:52
  • @marcog: Also, please note that we were not told that we know how many elements there are. If you don't know how many elements there are, you can't pop half of them. This is critical to realizing why you have to use both. – jason Feb 02 '11 at 17:53
  • 2
    @Jason "You are given an array" <- More often than not, that comes with the size. – moinudin Feb 02 '11 at 17:54
  • This is also (IMHO) a reason why asking vague interview questions on SO is a bad idea. You can't get clarification like you would in the interview. – moinudin Feb 02 '11 at 17:56
  • @Larson: Thanks for educating me on that. – EFreak Feb 02 '11 at 17:56
  • @Marcog: May be the interviewer wants to check a no of points.. Even I couldn't find much use of 2 heaps. Thats the reason why I posted. – EFreak Feb 02 '11 at 17:57
  • @marcog: Yes, I agree with your comments about vague interview questions. I think it's fairly reasonable to assume, however, that you don't know the number of elements otherwise there is absolutely no point to having two heaps. – jason Feb 02 '11 at 18:20
  • 1
    @Jason: There is a use of 2-heaps. If you allow future inserts, median finding can be made O(1) (inserts being O(log n)). –  Feb 02 '11 at 18:31
-1

JavaScript solution using two heaps:

function addNewNumber(minHeap, maxHeap, randomNumber) {
  if (maxHeap.size() === minHeap.size()) {
    if (minHeap.peek() && randomNumber > minHeap.peek()) {
      maxHeap.insert(minHeap.remove());
      minHeap.insert(randomNumber);
    } else {
      maxHeap.insert(randomNumber);
    }
  } else {
    if (randomNumber < maxHeap.peek()) {
      minHeap.insert(maxHeap.remove());
      maxHeap.insert(randomNumber);
    } else {
      minHeap.insert(randomNumber);
    }
  }
}

function getMedian(minHeap, maxHeap) {
  if (!maxHeap.size()) {
    return 0;
  }
  if (minHeap.size() === maxHeap.size()) {
    return (minHeap.peek() + maxHeap.peek()) / 2;
  } else {
    return maxHeap.peek();
  }
}
spark
  • 79
  • 1
  • 1