I have a dataset comprised of n unsorted tuples representing numbers (let's say specific color codes) and their frequency (number of times of appearance).
I want to find an exact median of the numbers with worst case complexity of O(n).
For example:
dataset: {(5000, 8000), (25000, 4000), (9, 9000)} median: 5000
dataset: {(7000, 4), (23000, 400), (3000, 9000), (2500, 12000), (19000, 350), (500, 9000)....} median: ?
Failed attempts so far:
- "Decompress" the list (so that it looks like this:
{7000, 7000, 7000, 7000, 23000, 23000...}
) and then sort it. Problem is - it takes Ω(nlogn), and probably more since the frequencies can be very large and doesn't have any upper bound. - Try to use QuickSelect over the data. To ensure O(n) time complexity we must guaranty good pivot selection. To do so I thought about Median of medians (supposedly O(n)) with the data - but I couldn't figure out how to do that without decompressing, thus making it potentially more than O(n).
Is there a way to manipulate the tuple list so that it wouldn't be decompressed and still use median of medians or another way to find the median?
End note: I don't want to assume anything about the dataset - amount of tuples, confined range of numbers/frequencies, etc).