10

This is a common interview question. You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).

Implement a class which supports three methods in O(1) 

* insert(int i); 
* getMean(); 
* getMedian(); 

This is my code.

public class FindAverage {

  private int[] store;
  private long size;
  private long total;
  private int highestIndex;
  private int lowestIndex;

  public FindAverage() {
    store  = new int[1000];
    size = 0;
    total = 0;
    highestIndex = Integer.MIN_VALUE;
    lowestIndex = Integer.MAX_VALUE;

  }

  public void insert(int item) throws OutOfRangeException {
    if(item < 0 || item > 999){
      throw new OutOfRangeException();
    }
    store[item] ++;
    size ++;
    total += item;
    highestIndex = Integer.max(highestIndex, item);
    lowestIndex = Integer.min(lowestIndex, item);
  }

  public float getMean(){
    return (float)total/size;
  }

  public float getMedian(){

  }
}

I can't seem to think of a way to get the median in O(1) time. Any help appreciated.

Melissa Stewart
  • 3,483
  • 11
  • 49
  • 88
  • 2
    Why can't you update median (saved as (value, number among values)) on `insert` just as you do with `total`? – Abstraction Mar 28 '17 at 14:49
  • 2
    Given your `store` has a fixed (1000) number of elements, nearly any code you write to compute the median will be O(1). – Paul Hankin Mar 28 '17 at 15:14
  • @PaulHankin it doesn't have a fixed number of elements. You might want to read the question again. – Melissa Stewart Mar 28 '17 at 15:15
  • @Abstraction That's what I'm trying to do, but can't seem to find a way to. – Melissa Stewart Mar 28 '17 at 15:16
  • @MelissaStewart Paul is right, `store` has fixed number of elements (1000). Doesn't matter how many values are inserted. See [my answer](http://stackoverflow.com/a/43073372/5221149). – Andreas Mar 28 '17 at 15:16
  • @MelissaStewart your initialization `store = new int[1000];` guarantees that `store` has a fixed number of elements. You might want to read your code again. – Lew Bloch Mar 28 '17 at 17:39

3 Answers3

10

You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.

You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.

Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.

Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:

An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.

In your case, that bound is 1000.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • It's nitpicky maybe, but O(1) doesn't mean unrelated to `n`. For example, `for (i = 0; i < n % 10000; i++) putchar('.');` is O(1) since it never outputs more than 10000 items, but the runtime depends on `n` (specifically, `n` modulo 10000). – Paul Hankin Mar 28 '17 at 15:18
  • Maybe I'm not understanding you, but iterating a list doesn't make the algorithm O(1) right. – Melissa Stewart Mar 28 '17 at 15:19
  • 4
    @MelissaStewart iterating over a list of bounded size is O(1). Doing 1000 things is a constant amount of work, so O(1). – Paul Hankin Mar 28 '17 at 15:20
  • @PaulHankin ok that helps, I can think of a way to use that. – Melissa Stewart Mar 28 '17 at 15:22
  • "if n reaches into the billions" - the value will be 1000000000/2. So it is bound to n as n in this case is 1000. And we would get linear time, not constant – Alex Vovchuk Jul 20 '20 at 20:47
  • @AlexVovchuk *"n in this case is 1000"* No, `n` is the number of inputs, aka the number of items inserted, aka the value of `size`, not the length of the `store` array, which is always 1000. That is why `getMedian()` has a time complexity of _O(1)_, since the performance of making that call is unrelated to `n`, same as the `insert()` and `getMean()` method performance is unrelated to the number of inputs. – Andreas Jul 20 '20 at 21:45
3

The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.

To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.

EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
2

For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.

Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).

sort(A):
  ds = new D()
  for each x in A:
    ds.add(x)
  m1 = min(A) - 1
  m2 = max(A) + 1
  for (i=0; i < A.size(); i++):
    ds.add(m1)
  # at this point, ds.median() is smallest element in A
  for (i = 0; i < A.size(); i++):
    yield ds.median()
    # Each two insertions advances median by 1
    ds.add(m2)
    ds.add(m2)

Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.

Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.

Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.

QED.

amit
  • 175,853
  • 27
  • 231
  • 333
  • 1
    your proof is completely divergent off the scope of the question, but super elegant nonetheless. – Melissa Stewart Mar 28 '17 at 15:43
  • @MelissaStewart Updated the solution to catch wider range of computation models (reduction from sorting instead of from element distinctness). – amit Mar 28 '17 at 16:32
  • @amit The point is that since there are only 1000 possible values, we are NOT in the general case. – btilly Mar 28 '17 at 17:41
  • Why are you talking about sorting? Are you implying that finding the median in O(n) is impossible in the general case? Quickselect yields O(n) average-case median selection. – MarredCheese May 02 '18 at 03:29