7

In writing some code today, I have happened upon a circumstance that has caused me to write a binary search of a kind I have never seen before. Does this binary search have a name, and is it really a "binary" search?

Motivation

First of all, in order to make the search easier to understand, I will explain the use case that spawned its creation.

Say you have a list of ordered numbers. You are asked to find the index of the number in the list that is closest to x.

int findIndexClosestTo(int x);

The calls to findIndexClosestTo() always follow this rule:

If the last result of findIndexClosestTo() was i, then indices closer to i have greater probability of being the result of the current call to findIndexClosestTo().

In other words, the index we need to find this time is more likely to be closer to the last one we found than further from it.

For an example, imagine a simulated boy that walks left and right on the screen. If we are often querying the index of the boy's location, it is likely he is somewhere near the last place we found him.

Algorithm

Given the case above, we know the last result of findIndexClosestTo() was i (if this is actually the first time the function has been called, i defaults to the middle index of the list, for simplicity, although a separate binary search to find the result of the first call would actually be faster), and the function has been called again. Given the new number x, we follow this algorithm to find its index:

  1. interval = 1;
  2. Is the number we're looking for, x, positioned at i? If so, return i;
  3. If not, determine whether x is above or below i. (Remember, the list is sorted.)
  4. Move interval indices in the direction of x.
  5. If we have found x at our new location, return that location.
  6. Double interval. (i.e. interval *= 2)
  7. If we have passed x, go back interval indices, set interval = 1, go to 4.

Given the probability rule stated above (under the Motivation header), this appears to me to be the most efficient way to find the correct index. Do you know of a faster way?

Community
  • 1
  • 1
Cory Klein
  • 51,188
  • 43
  • 183
  • 243
  • I assume this is actually an array and not a list? Because binary search on a list would be stupid. – Nemo Aug 26 '11 at 16:17
  • 1
    I would assume that the best answer will depend on exactly what the probability distribution is for the position based on i. eg if there is a 99% chance that is is within 3 of i then a very different answer will be useful compared to if it is only 0.001% more likely to be at i than anywhere else. I think the optimal answer would be a probability based distribution such that the binary search picks a spot that gives a 50% chance of the desired item being on each side. So if you can define the probability curve you can probably define a pretty good algorithm. – Chris Aug 26 '11 at 16:52
  • @Chris very good point. If all data points were _nearly_ equal in probability, this would likely be worse than a regular binary search. In my case, the probability appears to decay exponentially the further you get from the last point, in which case, I believe this search is faster. – Cory Klein Aug 26 '11 at 17:05

4 Answers4

5

In the worst case, your algorithm is O((log n)^2).

Suppose you start at 0 (with interval = 1), and the value you seek actually resides at location 2^n - 1.

First you will check 1, 2, 4, 8, ..., 2^(n-1), 2^n. Whoops, that overshoots, so go back to 2^(n-1).

Next you check 2^(n-1)+1, 2^(n-1)+2, ..., 2^(n-1)+2^(n-2), 2^(n-1)+2^(n-1). That last term is 2^n, so whoops, that overshot again. Go back to 2^(n-1) + 2^(n-2).

And so on, until you finally reach 2^(n-1) + 2^(n-2) + ... + 1 == 2^n - 1.

The first overshoot took log n steps. The next took (log n)-1 steps. The next took (log n) - 2 steps. And so on.

So, worst case, you took 1 + 2 + 3 + ... + log n == O((log n)^2) steps.

A better idea, I think, is to switch to traditional binary search once you overshoot the first time. That will preserve the O(log n) worst case performance of the algorithm, while tending to be a little faster when the target really is nearby.

I do not know a name for this algorithm, but I do like it. (By a bizarre coincidence, I could have used it yesterday. Really.)

Nemo
  • 70,042
  • 10
  • 116
  • 153
  • Interpolation it is. And if you are dealing with a large array (n > 1024) the interpolation would generally be better than binary. For n > 10000, this will be faster very comfortably. – Ajeet Ganga Feb 03 '13 at 01:05
4

What you are doing is (IMHO) a version of Interpolation search

In a interpolation search you assume numbers are equally distributed, and then you try to guess the location of a number from first and last number and length of the array.

In your case, you are modifying the interpolation-algo such that you assume the Key is very close to the last number you searched.

Also note that your algo is similar to algo where TCP tries to find the optimal packet size. (dont remember the name :( )

  1. Start slow
    1. Double the interval
    2. if Packet fails restart from the last succeeded packet./ Restart from default packet size.. 3.
Ajeet Ganga
  • 8,353
  • 10
  • 56
  • 79
1

Your routine is typical of interpolation routines. You don't lose much if you call it with random numbers (~ standard binary search), but if you call it with slowly increasing numbers, it won't take long to find the correct index.

This is therefore a sensible default behavior for searching an ordered table for interpolation purposes.

This method is discussed with great length in Numerical Recipes 3rd edition, section 3.1.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
0

This is talking off the top of my head, so I've nothing to back it up but gut feeling!

At step 7, if we've passed x, it may be faster to halve interval, and head back towards x - effectively, interval = -(interval / 2), rather than resetting interval to 1.

I'll have to sketch out a few numbers on paper, though...

Edit: Apologies - I'm talking nonsense above: ignore me! (And I'll go away and have a proper think about it this time...)

nihilogist
  • 352
  • 4
  • 12