Binary search stops working when more than 44 elements in an array

Question

This is my first time posting here, so forgive stupid mistakes if such happen.

I am a beginner in C coding (learning right now), and frankly not much of a math genius so the problem I encountered stumps me quite a bit. I found a solution to it by the way of testing different things, but honestly I don't exactly understand why my solution does work, and the code before the solution applied doesn't.

The code is a binary search for a given number in a sorted array created by using seeded srand48, and sorted with a linear sort.

To frame the situation. Code before the solution did work well until array reached 45 elements, then it stopped finding the number.

Before the solution code:

bool search(int value, int values[], int n)
{
    int first, middle, last;

    first = 0;
    last = n - 1;
    middle = (first + last) / 2;

    if (n > 0)
    {
        while (first < last)
        {
            if (values[middle] == value)
            {
                return true;
            }
            else if (values[middle] < value)
            {
                first = middle + 1;
            }
            else
            {
                last = middle - 1;
            }
            middle = (first + last) / 2;
        }
    }
    else
    {
        return false;
    }
    return 0;
}

The solution I found was just removing + and - 1 from the new middle value assignment. After I did this it works perfectly fine. What I would like to know for my own future reference and learning is why exactly the first code does not work, and why did that solution fix it. (I might somehow subconsciously understand how this works and that is why I came up with the fix, but I can't consciously figure out the logic behind it.)

You should step through line-by-line with a debugger. Or try it out on paper. — Oliver Charlesworth, Jun 25 '17 at 13:18
Possible duplicate of [Where is the mistake in my code to perform Binary Search?](https://stackoverflow.com/questions/38258457/where-is-the-mistake-in-my-code-to-perform-binary-search) — Matt Timmermans, Jun 25 '17 at 13:22
Don't check for equality. Just run the binary search until first and last converge. This will fix your error, and also be slightly faster. — Malcolm McLean, Jun 25 '17 at 13:32
In order to understand what exactly is wrong with your solution, check it on border cases: length 0, length 1, length 2, element being searched is the first/last one. — n. m. could be an AI, Jun 25 '17 at 13:34
If `n` is 1 and you have only 1 element in your array that matches the number you are looking for, this solution will fail. — nglee, Jun 25 '17 at 13:35
@Coldspeed we don't need 100 variants of "my binary search is broken" on SO. We need two or three correct implementations with explanation. — Matt Timmermans, Jun 25 '17 at 13:35
https://stackoverflow.com/a/44726934/3205016 you can check this @Bartek — Mohiuddin Sumon, Jun 25 '17 at 13:37
@Md.MohiuddinAhmed: the code in reference is badly broken... — chqrlie, Jun 25 '17 at 14:23
@MattTimmermans: This version is functional with subtle bugs that deserve a precise analysis. Other duplicates deserve to be closed much more than this one, notably this one: http://stackoverflow.com/a/44726934/3205016 — chqrlie, Jun 25 '17 at 14:25
I think that @MattTimmermans didn't fully understand what I was exactly asking for (it wasn't really debugging my code, but rather an 'in words' explanation on why my way of thinking is wrong in the first case and why the change I made fixed part of the problem) I might have just worded it wrongly (I am not native English), but nevertheless my questions were answered, so the thread my as well stay on hold. — Bartek Gańcza, Jun 25 '17 at 15:06

chqrlie · Accepted Answer · 2017-06-25T14:55:32.530

There are some subtle problems in your code:

It does not work if the array has a single element: it will always return 0. This problem is actually causing your observed behavior: you test while (first < last) but first and last are both included in the range to search. You should either use while (first <= last) or change the algorithm to exclude last, which I favor, as you no longer need to special case the empty array with this solution.
It can return false or 0 in case of failure, which is OK in C but different in javascript. You should remove the else clause.
The computation for middle has a potential arithmetic overflow if n is larger than half the maximum value if int and the value is large enough. This is actually a classic bug that remained in hiding for many years in many standard C libraries. Instead of middle = (first + last) / 2, you should use:
```
middle = first + (last - first) / 2;
```

Here is a corrected and simplified version:

bool search(int value, int values[], int n) {
    int first = 0, last = n;

    while (first < last) {
        int middle = first + (last - first) / 2;
        if (values[middle] == value) {
            return true;
        }
        if (values[middle] < value) {
            first = middle + 1;
        } else {
            last = middle;
        }
    }
    return false;
}

Note that the middle value where value was found is not necessarily the smallest index where is appears if there are duplicates in the array. It is not a problem here as you are only interested in the boolean result, but the algorithm must altered if you want to find the smallest index.

Matt Timmermans posted a better algorithm in another question that is both more efficient and has this property:

bool search(int value, int values[], int n) {
    int pos = 0;
    int limit = n;

    while (pos < limit) {
        int middle = pos + ((limit - pos) >> 1);

        if (values[middle] < value)
            pos = middle + 1;
        else
            limit = middle;
    }
    return pos < n && values[pos] == value;
}

As to your final question: why does removing the + 1 and -1 from your code fix the problem? Your fix does not fully correct the problem: your changes have this effect:

first = middle; is just suboptimal, the search will take marginally longer but no other effect.
last = middle; fixes the problem because the loop assumes last to be excluded, which is consistent with last = middle;.
The remaining problem is the initial value for last, which should be last = n;.
With last = n - 1;, you will still fail to locate the value if it is at the end of the array without duplicates, including if the array has only one element.

Nice. How do you exclude `last`? Is it done by setting it to one index beyond the array's end? — cs95, Jun 25 '17 at 13:41
`last` is excluded by design: its intial value is `n`, the number of elements in the array, hence excluded, and updated value is `middle` if `values[middle] > value`, hence excluded too. If `first >= last`, the range is empty, the value cannot be found. — chqrlie, Jun 25 '17 at 13:43
@BartekGańcza: your question was put on hold as a potential duplicate, yet the problems with your implementation have been addressed in my answer. I hope you find it useful and other readers might learn a thing or two from it. — chqrlie, Jun 25 '17 at 14:12
@chqrlie: I have accepted the answer as it corrects the faulty code I posted and does tremendously help me in learning more optimal ways of handling code like this, but it does not directly answer my question (correction, it does answer half of my question :D) :) I stated that I have removed the fault (minus the ability to handle some rare cases) and I just wanted to hear an "in words" explanation on why my solution works, since I honestly did not exactly understood it myself :D Thank you nonetheless. — Bartek Gańcza, Jun 25 '17 at 14:42
@BartekGańcza: Your fix does not fully correct the problem: your changes have this effect: `first = middle;` is just suboptimal, the search will take marginally longer but no other effect. `last = middle;` fixes the problem because the loop assumes `last` to be excluded, which is consistent with `last = middle;`. The remaining problem is the initial value for `last`, which should be `last = n;`. With `last = n - 1;`, you will still fail to locate the value if it is at the end of the array without duplicates, including if the array has only one element. — chqrlie, Jun 25 '17 at 14:53
@chqrlie You are right I should have said I fixed part of the problem :) Nevertheless your solution is not only fully fixing it, but is also heaps and bounds better in design which is a good lesson. The explanation in comment also explains the rest that I wanted to know, I think that I fully understand the logic behind it right now. Thank you. I must say, it is difficult when you have problems with quickly grasping mathematical notations meaning. It definitely hampers the ability to understand why certain things have to be noted in a certain way. — Bartek Gańcza, Jun 25 '17 at 15:01

Binary search stops working when more than 44 elements in an array

1 Answers1