1

Hello I am trying to understand how the median of medians algorithm works. In all examples I've seen so far there already are the groups of the numbers divided , before the execution of the algorithm begins. So I cannot understand how these groups are made. To be more specific at the examples studied so far, is stated that there are 9 groups of 5 numbers each, for example aka 45 numbers, or 4 groups of 10 numbers aka 40 numbers at all. So what if we have n numbers..? Is there any good technique that should I follow to find the number of elements its group should have ?

JmRag
  • 1,443
  • 7
  • 19
  • 56
  • Could you try to clarify the algorithms studied so far? Additionally, if you could put examples etc. in code blocks, it would help. – BlackVegetable Jan 28 '14 at 17:45
  • You divide the whole set of numbers to groups of five, first five numbers will form the first group, next five will be the next group etc., last group will possibly have less than five elements. The size of the groups is always 5, hence you end with `n/5` or `n/5+1` groups. – pepo Jan 28 '14 at 17:50
  • @BlackVegetable I am in a little bit of a hurry now so I will edit the question in a couple of hours to be more specific! – JmRag Jan 28 '14 at 17:53
  • @pepo I know it's not a good technique to post hyperlinks but I dont want to copy the site's content. http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/ Here the groups are number of three – JmRag Jan 28 '14 at 17:54
  • 1
    It should work with any odd sized groups (greater than 1 ofc). The purpose of those groups is to strip away elements that are surely lower or grater than the median of medians. If you make your groups of size `2k+1`, then in each group there are at least `k` elements smaller or `k` elements bigger than the median of medians, which leaves you with `n(k+1)/(2k+1)` elements for the recursive call. – pepo Jan 28 '14 at 18:11

1 Answers1

1

MoM is a recursive algorithm. It exists as a sound way to select a "pivot" for an algorithm like quicksort or quickselect. Thus, it needs to operate within certain time bounds.

It might be easier to understand if explained as a base case and a recursive case.

The base case is clear enough. If you have less than five elements in a list, then you find the median the naive way.

But, if your list has at least five elements, you can apply the recursive case. You're going to take successive groups of five elements from your big list, find their median, and add it to a smaller list. (If you have some left over, you can ignore them.)

If this new, smaller list is small enough, you can apply the base case, as described above. Otherwise, you'll go through the "small" list to create another, still smaller list. Lather, rinse, and repeat until you get down to less than five elements remaining. And that's your estimate of the overall median. So it works with any size of list.

So how big should "five" be? Well, it turns out that 5 is optimal. Someone showed the complexity analysis over at the Wikipedia page for this topic. Essentially, larger values of "five" get you a better approximation of the median at the cost of more work to find the median of "five". Unfortunately 3 does not decrease the search space enough per iteration to be a worthwhile choice of "five". And it generally needs to be odd, unless you want to spend cycles splitting the difference between elements.

Ian
  • 4,421
  • 1
  • 20
  • 17