2

This is the binary search algorithm from Wikipedia. It uses int type indexing. The use of the int type here is necessary because the algorithm sometimes uses the negative index values while searching.

int binary_search(unsigned array[], int length, unsigned needle) {
  int l = 0, r = length - 1, m;
  while (l <= r) {
    m = (l + r) / 2;
    if (array[m] < needle)
      l = m + 1;
    else if (array[m] > needle)
      r = m - 1;
    else
      return m;
  }
  return -1;
}

How to modify this algorithm to use size_t (unsigned int) instead of int?

What is an optimal solution for this?

size_t binary_search(unsigned array[], size_t length, unsigned needle) {
  size_t l = 0, r = length - 1, m;

  ???

  return (size_t) -1;
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
NoSkill
  • 718
  • 5
  • 15
  • *How to modify this algorithm to use size_t* -- `m = (l + r) / 2;` -- Possible overflow here: `(l + r)` – PaulMcKenzie Mar 27 '23 at 18:10
  • 1
    By revising the binary search algorithm to avoid `r = m - 1;` – Weather Vane Mar 27 '23 at 18:11
  • @PaulMcKenzie can we handle it? `m = l + (r - l) / 2` – NoSkill Mar 27 '23 at 18:21
  • 2
    @NoSkill Since you tagged C++, and if you have C++ 20, you can use [std::midpoint](https://en.cppreference.com/w/cpp/numeric/midpoint). – PaulMcKenzie Mar 27 '23 at 18:23
  • I don't know why people are concentrating on the possible overflow in `l+r`: on most implementations `array` cannot be large enough for it to happen in type size_t. It is the case if `sizeof(unsigned)>=2`, or if like gcc the compiler does not allow objects that occupy more than half the address space (and size_t matches that), or... – Marc Glisse Mar 28 '23 at 21:05

3 Answers3

2

We, beginners, should help each other.:)

Here you are.

As you specified the language tags C and C++ then the demonstration program below can be run in C and C++. That is there are no special features of C++ except using names bool, false and true that in C can be used if to include header <stdbool.h> Also in C you need to include header <stdio.h> instead of <iostream>

#include <iostream>

size_t binary_search( const int a[], size_t n, int value )
{
    size_t left = 0, right = n, middle = 0;
    bool found = false;

    while ( !found && left != right )
    {
        middle = left + ( right - left ) / 2;

        if ( value < a[middle] )
        {
            right = middle;
        }
        else if ( a[middle] < value )
        {
            left = middle + 1;
        }
        else
        {
            found = true;
        }
    }

    return found ? middle : -1;
}

int main( void )
{
    int a[] = { 1, 3, 5, 7, 9 };
    const size_t N = sizeof( a ) / sizeof( *a );

    for ( int value = 0; value <= a[N - 1] + 1; value++ )
    {
        size_t pos = binary_search( a, N, value );

        if ( pos != -1 )
        {
            printf( "%d is found at %zu.\n", value, pos );
        }
        else
        {
            printf( "%d is not found.\n", value );
        }
    }
}

The program output is

0 is not found.
1 is found at 0.
2 is not found.
3 is found at 1.
4 is not found.
5 is found at 2.
6 is not found.
7 is found at 3.
8 is not found.
9 is found at 4.
10 is not found.

Pay attention to that in general in C++ you will need to write a template function while in C you will write the function similarly to the implementation of bsearch.

Without using the variable found the function can look as

size_t binary_search( const int a[], size_t n, int value )
{
    size_t left = 0, right = n;

    while ( left != right )
    {
        size_t middle = left + ( right - left ) / 2;

        if ( value < a[middle] )
        {
            right = middle;
        }
        else if ( a[middle] < value )
        {
            left = middle + 1;
        }
        else
        {
            return middle;
        }
    }

    return -1;
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Well... this solution sometimes will check the same index twice. Am I right? – NoSkill Mar 27 '23 at 18:45
  • @NoSkill In the nested if statements the expression a[middle] can be used twice if the first if statement evaluates to false. It is the same behavior as the behavior in your code provided in the question. – Vlad from Moscow Mar 27 '23 at 18:48
  • As `right` is out of range (the way I do it too) shouldn't the end condition be `right - left > 1`? – Weather Vane Mar 27 '23 at 19:01
  • 3
    To summarize: this makes `right` an exclusive bound instead of an inclusive one. This is usually how I do it, too. And more generally, I find that representing intervals (such as the range of indices at which the target might appear) in half-open form tends to produce a lot of conveniences. – John Bollinger Mar 27 '23 at 19:03
  • @WeatherVane Why can not right - left be equal to 1? In this case in any case if statements must be executed. – Vlad from Moscow Mar 27 '23 at 19:06
  • @WeatherVane, when `right == left + 1`, index `left` still has to be considered. For example, when the whole array is length 1 (that is, `n == 1`), you still have to check element 0. – John Bollinger Mar 27 '23 at 19:08
  • @VladfromMoscow Ah ok... I understand. So the nested range in this case can be a little bigger (+ one more element) as you remove `-1`. – NoSkill Mar 27 '23 at 19:11
  • @Vlad because at that point `left` is either matches or it does not: there are no other candidates. I check that after the search completes. – Weather Vane Mar 27 '23 at 19:11
  • @JohnBollinger yes, I do that after the search completes. – Weather Vane Mar 27 '23 at 19:13
  • 2
    @WeatherVane If left is equal to right then the range is empty. If right - left is equal to 1 then the range contains one element that must be checked. – Vlad from Moscow Mar 27 '23 at 19:14
0

Using int is to make sure `r = length - 1' is negative if the array is empty. So if you want to use size_t, just add a check in case of empty array.

Zwy
  • 353
  • 1
  • 10
0

There is a classic problem in the posted code:

  • m = (l + r) / 2; may cause an arithmetic overflow for large arrays, invoking undefined behavior. On many architectures, m may get a negative value, causing further undefined behavior when dereferencing array[m]. m should be computed as

        m = l + (r - l) / 2;
    
  • Note that the expression m = (l + r) / 2; would still cause problems for unsigned index types if the array is even larger.

To use index type size_t and thus handle larger arrays, you must also change the return type to size_t and use as a convention a return value of SIZE_MAX to mean not found.

Here is an alternative using size_t index types:

size_t binary_search(const unsigned array[], size_t length, unsigned needle) {
    size_t lo = 0, hi = length;
    while (lo < hi) {
        size_t mid = lo + (hi - lo) / 2;
        if (array[mid] < needle) {
            lo = mid + 1;
        } else {
            hi = mid;
        }
    }
    if (lo < length && array[lo] == needle) {
        return lo;
    } else {
        return (size_t)-1;  // SIZE_MAX
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189