Binary Search on Sorted List with Duplicates

Question

To learn divide-and-conquer algorithms, I am implementing a function in Python called binary_search that will get the index of the first occurrence of a number in a non-empty, sorted list (elements of the list are non-decreasing, positive integers). For example, binary_search([1,1,2,2,3,4,4,5,5], 4) == 5, binary_search([1,1,1,1,1], 1) == 0, and binary_search([1,1,2,2,3], 5) == -1, where -1 means the number cannot be found in the list.

Below is my solution. Although the solution below passed all the tests I created manually it failed test cases from a black box unit tester. Could someone let me know what's wrong with the code below?

def find_first_index(A,low,high,key):
    if A[low] == key:
        return low
    if low == high:
        return -1
    mid = low+(high-low)//2
    if A[mid]==key:
        if A[mid-1]==key:
            return find_first_index(A,low,mid-1,key)  
        else:
            return mid
    if key <A[mid]:            
        return find_first_index(A,low,mid-1,key)    
    else: 
        return find_first_index(A, mid+1, high,key)
    
    
def binary_search(keys, number):
    index = find_first_index(A=keys, low=0,high=len(keys)-1,key=number)  
    return(index)

What does `binary_search([], 1)` return and what should it return? — mkrieger1, Jan 05 '22 at 02:07
you forget to include the mid position in your divide: `find_first_index(A,low,mid-1,key)` `find_first_index(A,mid+1,high,key)` — David, Jan 05 '22 at 02:53
Hi David, I did take care of the midpoint. In my solution, first I wanted to find the midpoint of the list. Next, I checked if the midpoint is the first occurrence of `number`. If so, return the current position of the midpoint. Otherwise, if the midpoint were not the first occurrence of `number`, i.e., there exists some element `number` that precedes the midpoint, then I apply the same function recursively on the interval [low,midpoint-1]. — confused_programmer, Jan 05 '22 at 03:47
That convoluted way of calculating `mid` is unnecessary in Python, because Python integers can't overflow. Just use `mid = (high+low) // 2`. — Mark Ransom, Jan 05 '22 at 04:57
Binary search is deceptively simple in theory but hideously hard in practice, see [Are you one of the 10% of programmers who can write a binary search?](https://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/) — Mark Ransom, Jan 05 '22 at 05:04
One problem is that `A[mid-1]==key` would refer to the end of the array if mid is 0. — jodag, Jan 05 '22 at 05:16

Riccardo Bucco · Accepted Answer · 2022-01-05T11:27:58.487

This should work:

def find_first_index(A, low, high, key):
    if A[low] == key:
        return low
    if low == high:
        return -1
    mid = low + (high - low) // 2
    if A[mid] >= key:
        return find_first_index(A, low, mid, key)
    return find_first_index(A, mid + 1, high, key)


def binary_search(keys, number):   
    return find_first_index(keys, 0, len(keys) - 1, number)

Your solution does not work, as you have already realized. For example, it breaks with the following input:

>>> binary_search([1, 5], 0)
...
RecursionError: maximum recursion depth exceeded in comparison

As you can see, the function does not even terminate, there's an infinite recursion going on here. Try to "run" your program on a piece of paper to understand what's going on (or use a debugger), it's very formative.

So, what's the error? The problem is that starting from some function call high < low. In this specific case, in the first function call low == 0 and high == 1. Then mid = 0 (because int(low + (high - low) / 2) == 0). But then you call find_first_index(A, low, mid - 1, key), which is basically find_first_index(A, 0, -1, key). The subsequent call will be exactly the same (because with low == 0 and high == -1 you will have again mid == 0). Therefor, you have an infinite recursion.

A simple solution in this case would be to have

if low >= high:
    return -1

Or just use my previous solution: checking mid - 1 in my opinion is not a good idea, or at least you must be much more careful when doing that.

Binary Search on Sorted List with Duplicates

1 Answers1