The given problem:
A multiset is a set in which some of the elements occur more then once (e.g. {a, f, b, b, e, c, b, g, a, i, b} is a multiset). The elements are drawn from a totally ordered set. Present an algorithm, when presented with a multiset as input, finds an element that has the most occurrences in the multiset (e.g. in {a, f, b, b, e, c, b, g, a, c, b}, b has the most occurrences). The algorithm should run in O(n lg n/M +n) time, where n is the number of elements in the multiset and M is the highest number of occurrences of an element in the multiset. Note that you do not know the value of M.
[Hint: Use a divide-and-conquer strategy based on the median of the list. The subproblems generated by the divide-and-conquer strategy cannot be smaller than a ‘certain’ size in order to achieve the given time bound.]
Our initial solution:
Our idea was to use Moore's majority algorithm to determine if the multiset contained a majority candidate (eg. {a, b, b} has a majority, b). After determining if this was true or false we either output the result or find the median of the list using a given algorithm (known as Select) and split the list into three sublists (elements less than and equal to the median, and elements greater than the median). Again, we would check each of the lists to determine if the majority element was present, if so, that is your result.
For example, given the multiset {a, b, c, d, d, e, f}
Step 1: check for majority. None found, split the list based on the median.
Step 2: L1 = {a, b, c, d, d}, L2 = {e, f} Find the majority of each. None found, split the lists again.
Step 3: L11 = {a, b, c} L12 = {d, d} L21 = {e} L22 = {f} Check each for majority elements. L12 returns d. In this case, d is the most occurring elements in the original multiset, thus is the answer.
The issues we're having are whether this type of algorithm is fast enough, as well as whether this can be done recursively or if a loop that terminates is required. Like the hint says, the sub-problems cannot be smaller than a 'certain' size, which we believe to be M (the most occurrences).