1

First time posting here, so apologies in advance if I am not following best practices. My algorithm is supposed to do the following in a sorted array with possible duplicates.

  1. Return -1 if the element does not exist in the array
  2. Return the smallest index where the element is present.

I have written a binary search algorithm for an array without duplicate. This returns a position of the element or -1. Based on blackbox testing, I know that the non-duplicate version of the binary search works. I have then recursively called that function via another function to search from 0 to position-1 to find the first incidence of the element, if any.

I am currently failing a black box test. I am getting a wrong answer error and not a time out error. I have tried most of the corner cases that I could think of and also ran a brute force test with the naive search algorithm and could not find an issue.

I am looking for some guidance on what might be wrong in the implementation rather than an alternate solution.

The format is as follow: Input:

5 #array size

3 4 7 7 8 #array elements need to be sorted

5 #search query array size

3 7 2 8 4 #query elements

Output

0 2 -1 4 1

My code is shown below:

class BinarySearch:
 
 def __init__(self,input_list,query):
        
    self.array=input_list
    self.length=len(input_list)
    self.query=query
    return
 
 def binary_search(self,low,high):
    '''
    Implementing the binary search algorithm with distinct numbers on a 
    sorted input.
    '''
    
    #trivial case
    if (self.query<self.array[low]) or (self.query>self.array[high-1]):
        
        return -1
        
    elif (low>=high-1) and self.array[low]!=self.query:
        
        return -1
    else:
        m=low+int(np.floor((high-low)/2))
        if self.array[low]==self.query:
            return low
        elif (self.array[m-1]>=self.query):
            return self.binary_search(low,m)
        elif self.array[high-1]==self.query:
            return high-1
        else:    
            return self.binary_search(m,high)
    return
        
class DuplicateBinarySearch(BinarySearch):
     
     def __init__(self,input_list,query):
       
        BinarySearch.__init__(self,input_list,query)
     
     def handle_duplicate(self,position):
        '''
        Function handles the duplicate number problem.
        Input: position where query is identified.
        Output: updated earlier position if it exists else return 
        original position.
        '''
        
        if position==-1:
            return -1
        elif position==0:
            return 0
        elif self.array[position-1]!=self.query:
            return position
        else:
            new_position=self.binary_search(0,position)
            if new_position==-1 or new_position>=position:
                return position
            else:
                return self.handle_duplicate(new_position)
     
     def naive_duplicate(self,position):
        
        old_position=position
        if position==-1:
            return -1
        else:
            while position>=0 and self.array[position]==self.query:
                position-=1
            if position==-1:
                return old_position
                
            else:
                return position+1
     


 if __name__ == '__main__':
    num_keys = int(input())
    input_keys = list(map(int, input().split()))
    assert len(input_keys) == num_keys
    

    num_queries = int(input())
    input_queries = list(map(int, input().split()))
    assert len(input_queries) == num_queries
   

    for q in input_queries:
        item=DuplicateBinarySearch(input_keys,q)
        #res=item.handle_duplicate(item.binary_search(0,item.length))
        #res=item.naive_duplicate(item.binary_search(0,item.length))
        #assert res_check==res
        print(item.handle_duplicate(item.binary_search(0,item.length)), end=' ')
        #print(item.naive_duplicate(item.binary_search(0,item.length)), end=' ')

When I run a naive duplicate algorithm, I get a time out error:

Failed case #56/57: time limit exceeded (Time used: 10.00/5.00, memory used: 42201088/536870912.)

When I run the binary search with duplicate algorithm, I get a wrong answer error on a different test case:

Failed case #24/57: Wrong answer

(Time used: 0.11/5.00, memory used: 42106880/536870912.)

The problem statement is as follows:

Problem Statement

Update:

I could make the code work by making the following change but I have not been able to create a test case to see why the code would fail in the first case.

Original binary search function that works with no duplicates but fails an unknown edge case when a handle_duplicate function calls it recursively. I changed the binary search function to the following:

 def binary_search(self,low,high):
        '''
        Implementing the binary search algorithm with distinct numbers on a sorted input.
        '''
        
        #trivial case
        if (low>=high-1) and self.array[low]!=self.query:
            return -1
        
        elif (self.query<self.array[low]) or (self.query>self.array[high-1]):
            return -1
            
        
        else:
            m=low+(high-low)//2
            if self.array[low]==self.query:
                return low
            elif (self.array[m-1]>=self.query):
                return self.binary_search(low,m)
            elif self.array[m]<=self.query:   
                return self.binary_search(m,high)
            elif self.array[high-1]==self.query:
                return high-1
            else: 
                return -1
bbt_wb
  • 11
  • 2
  • The issue may be that the test is not completing within a time limit. Your algorithm of calling binary search again with `low` equal 0 is not optimal, since `low` had already been adjusted in the initial binary search, so it is a waste to start from 0 again. – trincot Jun 15 '22 at 16:00
  • A correctly written binary search should be able to get to the first duplicate right off the bat in the main logic. – Mark Ransom Jun 15 '22 at 16:01
  • Thanks @trincot for responding. I am currently on 16/25 in the test suite and I am getting a wrong answer and not time out error. I see your point though that the code can be made faster but my issue is currently getting a wrong answer. – bbt_wb Jun 15 '22 at 16:03
  • @MarkRansom - That is true. I saw a solution in StackOverflow which shows that implementation you are talking about. I was trying to reuse an existing code that uses binary search without duplicate. I am still curious to understand why mine is not working. – bbt_wb Jun 15 '22 at 16:04
  • When you ask a question about code that produces the wrong output, you should provide the information to make it reproducible for us. – trincot Jun 15 '22 at 16:23
  • @trincot - Sorry this is my first time posting. As I am failing a black box test, I do not have the exact test case other than wrong answer. If there is a timeout error, it would have said timeout. Let me know what information you need, I would be happy to provide the same. I have added the information now. – bbt_wb Jun 15 '22 at 16:29
  • What you can do: provide the link to to the code challenge and quote it here, so all details, requirements, constraints... are there. Test with some other code that has an infinite loop, if indeed that code challenge site gives a different error message when there is a time out. – trincot Jun 15 '22 at 17:15
  • @trincot - Thanks. I am doing this as part of the Coursera course: Algorithmic toolbox. I tried the naive search algorithm. I am failing test case 56/57 with a timeout error. I will update the error messaged in original post. Link to the course: https://www.coursera.org/learn/algorithmic-toolbox/home/week/4 Not sure if that is going to help much. – bbt_wb Jun 15 '22 at 17:35
  • Updated the original post with error messages and a screenshot of problem statement – bbt_wb Jun 15 '22 at 17:42
  • The link needs an account, so this is not going to work for me. I'll have to pass for this question. Sorry. – trincot Jun 15 '22 at 21:25
  • Have you tested your code with an empty list? https://www.reddit.com/r/Jokes/comments/prdi4x/a_software_tester_walks_into_a_bar/ – Mark Ransom Jun 16 '22 at 02:49
  • The problem statement says that there has to be one element. Problem statement attached in OP. – bbt_wb Jun 16 '22 at 18:28
  • Thanks all. I was able to get the code work by playing around with elif and else.However, not sure why it didn't work in first case still. Could not generate a test case. Updated solution added to OP. – bbt_wb Jun 16 '22 at 18:29

1 Answers1

1

Since you are going to implement binary search with recursive, i would suggest you add a variable 'result' which act as returning value and hold intermediate index which equal to target value.

Here is an example:

def binarySearchRecursive(nums, left, right, target, result):

    """
    This is your exit point. 
    If the target is not found, result will be -1 since it won't change from initial value.
    If the target is found, result will be the index of the first occurrence of the target.
    """
    if left > right:
        return result 

    # Overflow prevention
    mid = left + (right - left) // 2

    if nums[mid] == target:
        # We are not sure if this is the first occurrence of the target.
        # So we will store the index to the result now, and keep checking.
        result = mid 
        # Since we are looking for "first occurrence", we discard right half.
        return binarySearchRecursive(nums, left, mid - 1, target, result) 
    elif target < nums[mid]:
        return binarySearchRecursive(nums, left, mid - 1, target, result)
    else:
        return binarySearchRecursive(nums, mid + 1, right, target, result)

if __name__ == '__main__':

    nums = [2,4,4,4,7,7,9]
    target = 4 

    (left, right) = (0, len(nums)-1)
    result = -1 # Initial value
    index = binarySearchRecursive(nums, left, right, target, result)

    if index != -1:
        print(index)
    else:
        print('Not found') 

From your updated version, I still feel the exit point of your function is a little unintuitive.(Your "trivial case" section)

Since the only condition that your searching should stop, is that you have searched all possible section of the list. That is when the range of searching area is 0, there is no element left to be search and check. In implementation, that is when left < right, or high < low, is true.

The 'result' variable, is initialized as -1 when the function first been called from main. And won't change if there is no match find. And after each successful matching, since we can not be sure if it is the first occurrence, we will just store this index into the result. If there are more 'left matching', then the value will be update. If there is not, then the value will be eventually returned. If the target is not in the list, the return will be -1, as its original initialized value.

Arthur Wang
  • 3,118
  • 4
  • 21
  • 29