I am trying to solve a problem exactly like this: nth smallest number among two databases of size n each using divide and conquer
From what I could figure out, the "comparing medians/median of medians" algorithm would give us the solution? My question is whether I am understanding this correctly.
array 1: [7 8 6 5 3]
array 2: [4 10 1 2 9]
First, find the median for each. we can do this by querying for k=n/2, where n is the size of that array. Being the 3rd smallest element in this case, this gives us 6 for the first array (call this m1
), and 4 for the second array (call this m2
).
Since m1 > m2
, create 2 arrays using the elements that are less than m1 and greater than m2 in that array.
array 1: [5 3]
array 2: [10 9]
^ How would we find the elements that are less than m1 and greater than m2? Would we just take m1 and m2 and compare them with every element in their respective arrays? I know this works when the two arrays are both sorted, but would sorting them first allow us to still get O(log(n)) queries?
I'm assuming we can continue to use our special query (can we?) to get the k=n/2 smallest element (median) for that particular array. If this is the case, we query for k=n/2=1, leaving us with new m1 = 3
, m2 = 9
.
m1 < m2
, so we make 2 arrays using elements that are greater than m1 and less than m2 in that array.
Since there are no elements in array 2 that are less than m2 = 9
, we are only left with one array with one element greater than m1 = 3
.
[5]
<- this is the median
I am also interested in seeing the proof of correctness (that this finds the median) by induction.