15

This is an interview question. Design a class, which stores integers and provides two operations:

void insert(int k)
int getMedian()

I guess I can use BST so that insert takes O(logN) and getMedian takes O(logN) (for getMedian I should add the number of of left/right children for each node).

Now I wonder if this is the most efficient solution and there is no better one.

Michael
  • 41,026
  • 70
  • 193
  • 341
  • 3
    With your scheme you can improve `getMedian` to `O(1)`: just look it up after every insert (which does no harm to the complexity) and store the value. – Steve Jessop Jul 06 '12 at 11:59
  • For an alternative structure, think about priority queueS. – Steve Jessop Jul 06 '12 at 12:00
  • @SteveJessop Could you please elaborate on how to improve `getMedian` to O(1)? – Michael Jul 06 '12 at 12:41
  • 3
    I mean that your data structure would have a data member `int currentMedian;`. Immediately after you insert an element into your BST, find the new median, and store that value into `currentMedian` before returning from `insert`. Then you can implement `int getMedian() { return currentMedian; }`, which is `O(1)`. – Steve Jessop Jul 06 '12 at 12:48
  • 1
    On further reflection, you can probably do something with a skip list too. That inserts in expected/amortized `O(log(n))`, and you can track the median node (and whether the number of elements is odd or even). Then each time you insert you just need to check whether to move the median one step to the left or right, according to whether you inserted on the left or right of the old median and whether the new size is odd or even. – Steve Jessop Jul 06 '12 at 12:55

4 Answers4

31

You can use 2 heaps, that we will call Left and Right.
Left is a Max-Heap.
Right is a Min-Heap.
Insertion is done like this:

  • If the new element x is smaller than the root of Left then we insert x to Left.
  • Else we insert x to Right.
  • If after insertion Left has count of elements that is greater than 1 from the count of elements of Right, then we call Extract-Max on Left and insert it to Right.
  • Else if after insertion Right has count of elements that is greater than the count of elements of Left, then we call Extract-Min on Right and insert it to Left.

The median is always the root of Left.

So insertion is done in O(lg n) time and getting the median is done in O(1) time.

Avi Cohen
  • 3,102
  • 2
  • 25
  • 26
  • 1
    Great, my cpp implementation: https://gist.github.com/jonnyhsy/7ec9546a3622cf575b82 – nrek Jan 19 '15 at 09:10
  • To find the Median, place the numbers you are given in value order and find the middle number. Example: find the Median of {13, 23, 11, 16, 15, 10, 26}. The middle number is 15, so the median is 15. (If there are two middle numbers, you average them.) What about the case of 2 middle numbers ??? – shifu Feb 14 '15 at 00:32
  • @user1743538 but sorting takes 'n log n' instead of just 'log n'. – Avi Cohen Feb 14 '15 at 07:06
  • 1
    If someone needs an implementation that also enables the remove, I have extended the solution linked by @Amit https://gist.github.com/JernejJerin/a26276d2289878bd7744 – Jernej Jerin May 08 '15 at 13:35
  • @JernejJerin, could you please briefly explain how your removal works? – Hengameh Aug 19 '15 at 01:49
  • @Hengameh The logic goes like that. First we need to check if the number that we want to remove is greater than the root element of the max heap. If it is, than we need to try to remove it from the min heap, otherwise we remove from max heap. Keep in mind that min and max heap are implemented as PriorityQueue, which means that [removal of specified object is done in linear time](https://docs.oracle.com/javase/8/docs/api/java/util/PriorityQueue.html). If the removal is successful we need to check for the number of elements in heap and fix it. – Jernej Jerin Aug 20 '15 at 10:49
  • Thanks for clarification. So, removal is O(n). :) – Hengameh Aug 22 '15 at 16:02
  • 1
    The idea is right, but a minor issue: the suggested solution assumes an even number of numbers. The mathematical definition of median also covers the case of odd number of numbers. From Wikipedia: If there are an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values (https://en.wikipedia.org/wiki/Median) – Ron Klein Jan 16 '17 at 12:20
  • How is this solution different from an AVL tree? That too will always be balanced, left <= root <= right. Using this, we ll know that the median will always be the root of the actual tree? – Omkar Kulkarni Feb 09 '22 at 21:33
5

See this Stack Overflow question for a solution that involves two heaps.

Community
  • 1
  • 1
user448810
  • 17,381
  • 4
  • 34
  • 59
1

Would it beat an array of integers witch performs a sort at insertion time with a sort algorithm dedicated for integer (http://en.wikipedia.org/wiki/Sorting_algorithm) if you choose your candidate amongst O < O(log(n)) and using an array, then getMedian would be taking the index at half of the size would be O(1), no? It seems possible to me to do better than log(n) + log(n).

Plus by being a little more flexible you can improve your performance by changing your sort algorithm according to the properties of your input (are the input almost sorted or not ...).

I am pretty much autodidact in computer science, but that is the way I would do it: simpler is better.

user1458574
  • 151
  • 1
  • 2
  • 1
    For large data this would suffer from the fact that `insert` is `O(n)`, but if the number of values stored is small this is probably the best way. – Steve Jessop Jul 06 '12 at 12:50
  • yep, it was hidden in the notes :) (I hoped no one would notice it). In real coding, the best implementation is context dependant still ^^. So I see the theorical questions of best implementation as trying to tell the minimum of a function in the form of a simple number when your function is parametrized. – user1458574 Jul 06 '12 at 13:35
1

You could consider a self-balancing tree, too. If the tree is fully balanced, then the root node is your median. Say, the tree is one-level deeper on one end. Then, you just need to know how many nodes are there in the deeper-side to pick the correct median.

grdvnl
  • 636
  • 6
  • 9
  • 4
    Suppose your maximally-balanced tree has an even number of nodes, then the median is the mean of two values. One of those values is the root node, and the other one is buried `(log n)` levels deep in the tree, since it's either the leftmost node of the right subtree or else the rightmost node of the left subtree. So you do need to track slightly more than just the root node and subtree sizes in order to access the median in `O(1)`, but the root alone is sufficient for `O(log n)`. – Steve Jessop Jul 06 '12 at 16:24
  • @SteveJessop, Could you please give some examples of the case, which median is average of "leftmost node of right subtree and root"? Every sample I try, median is average of rightmost node of left subtree and root! – Hengameh Aug 19 '15 at 02:21
  • 1
    @Hengameh: consider a tree consisting of two nodes: the root and its right child. Then the right child is the leftmost node of the right subtree, and the median is the mean of that with the root. – Steve Jessop Aug 19 '15 at 13:33