7

Given two sorted lists, each containing n real numbers, is there a O(log n) time algorithm to compute the element of rank i (where i coresponds to index in increasing order) in the union of the two lists, assuming the elements of the two lists are distinct?

EDIT: @BEN: This i s what I have been doing , but I am still not getting it.

I have an examples ;

List A : 1, 3, 5, 7 List B : 2, 4, 6, 8

Find rank(i) = 4.

First Step : i/2 = 2; List A now contains is A: 1, 3 List B now contains is B: 2, 4

         compare A[i] to B[i] i.e 

                 A[i] is less;

                 So the lists now become :

                   A: 3 
                   B: 2,4

Second Step: i/2 = 1

         List A now contains A:3
         List B now contains B:2 

         NoW I HAVE LOST THE VALUE 4 which is actually the result ...

I know I am missing some thing , but even after close to a day of thinking I cant just figure this one out...

Eternal Learner
  • 3,800
  • 13
  • 48
  • 78
  • @ Ben :NOPE This isn't homework.. I am gearing up to attend some interviews this summer.. Now you know y :) – Eternal Learner Apr 24 '10 at 19:15
  • possible duplicate of http://stackoverflow.com/questions/2531103/nth-smallest-number-among-two-databases-of-size-n-each-using-divide-and-conquer – outis Apr 24 '10 at 22:22

4 Answers4

8

Yes:

You know the element lies within either index [0,i] of the first list or [0,i] of the second list. Take element i/2 from each list and compare. Proceed by bisection.

I'm not including any code because this problem sounds a lot like homework.

EDIT: Bisection is the method behind binary search. It works like this:

Assume i = 10; (zero-based indexing, we're looking for the 11th element overall).

On the first step, you know the answer is either in list1(0...10) or list2(0...10). Take a = list1(5) and b = list2(5).

If a > b, then there are 5 elements in list1 which come before a, and at least 6 elements in list2 which come before a. So a is an upper bound on the result. Likewise there are 5 elements in list2 which come before b and less than 6 elements in list1 which come before b. So b is an lower bound on the result. Now we know that the result is either in list1(0..5) or list2(5..10). If a < b, then the result is either in list1(5..10) or list2(0..5). And if a == b we have our answer (but the problem said the elements were distinct, therefore a != b).

We just repeat this process, cutting the size of the search space in half at each step. Bisection refers to the fact that we choose the middle element (bisector) out of the range we know includes the result.

So the only difference between this and binary search is that in binary search we compare to a value we're looking for, but here we compare to a value from the other list.

NOTE: this is actually O(log i) which is better (at least no worse than) than O(log n). Furthermore, for small i (perhaps i < 100), it would actually be fewer operations to merge the first i elements (linear search instead of bisection) because that is so much simpler. When you add in cache behavior and data locality, the linear search may well be faster for i up to several thousand.

Also, if i > n then rely on the fact that the result has to be toward the end of either list, your initial candidate range in each list is from ((i-n)..n)

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • +1! And of course the list has to allow `O(1)` random access, but yes, binary search is the key. – polygenelubricants Apr 24 '10 at 03:09
  • Hey , Could you guide me as to how to proceed to do bisection. I am new to it and most of the materials I read on the net relates to finding the root of complex numbers. Not sure how to proceed towards using bisection or how it exactly works. – Eternal Learner Apr 24 '10 at 05:00
1

Here is how you do it.

Let the first list be ListX and the second list be ListY. We need to find the right combination of ListX[x] and ListY[y] where x + y = i. Since x, y, i are natural numbers we can immediately constrain our problem domain to x*y. And by using the equations max(x) = len(ListX) and max(y) = len(ListY) we now have a subset of x*y elements in the form [x, y] that we need to search.

What you will do is order those elements like so [i - max(y), max(y)], [i - max(y) + 1, max(y) - 1], ... , [max(x), i - max(x)]. You will then bisect this list by choosing the middle [x, y] combination. Since the lists are ordered and distinct you can test ListX[x] < ListY[y]. If true then we bisect the upper half our [x, y] combinations or if false then we bisect the lower half. You will keep bisecting until find the right combination.

There are a lot of details I left, but that is the general gist of it. It is indeed O(log(n))!

Edit: As Ben pointed out this actually O(log(i)). If we let n = len(ListX) + len(ListY) then we know that i <= n.

Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
0

When merging two lists, you're going to have to touch every element in both lists. If you don't touch every element, some elements will be left behind. Thus your theoretical lower bound is O(n). So you can't do it that way.

You don't have to sort, since you have two lists that are already sorted, and you can maintain that ordering as part of the merge.

i_am_jorf
  • 53,608
  • 15
  • 131
  • 222
  • 1
    You don't have to complete the merge, so it's `O(i)` not `O(n)`. But you can avoid the merge completely. – Ben Voigt Apr 24 '10 at 03:07
0

edit: oops, I misread the question. I thought given value, you want to find rank, not the other way around. If you want to find rank given value, then this is how to do it in O(log N):

Yes, you can do this in O(log N), if the list allows O(1) random access (i.e. it's an array and not a linked list).

  • Binary search on L1
  • Binary search on L2
  • Sum the indices

You'd have to work out the math, +1, -1, what to do if element isn't found, etc, but that's the idea.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • I gathered from the question that he was trying to find the value at index i, so he was given an index first, not a value to search for – Phil Apr 24 '10 at 03:03
  • You have it backward (the problem is to find the value given the index, not find the index given the value), but use of binary search is correct. – Ben Voigt Apr 24 '10 at 03:04
  • @Ben, @Phil: you're right, I misread. I'm still keeping up my answer for learning purposes. – polygenelubricants Apr 24 '10 at 03:08