I am looking primarily for a theoretical answer, ideally one addressing my misunderstanding (below) about binary search algorithms. Practical examples are welcome as well.
I have a complex function where I want to do a binary search on that function to find the value of n
, so that computed_goal
matches a given goal
, where goal
is a previously-given value I am seeking to match. The algorithm below illustrates this.
l = 0, h = 4000; // low, and high bounds for binary search
while (l < h)
{
n = (l + h) / 2;
x = vars / n; // vars represents a more complex computation, x depends on n
computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
if (computed_goal < goal)
l = n + 1;
else
h = n - 1;
if (abs(computed_goal - goal) / goal < 0.01) // check for relative error within 1%
return n; // successfully matched goal value!
else
return 0; // iteration failed to match goal to desired precision
}
Now, I think you may already be seeing a problem if you are familiar with binary search algorithm, but I'm not super-verse on them. My problem does not always present itself but sometimes it is as follows:
Say my function gives me the following values for these n
:
n = 2998: computed_goal = 967.8732568 (-2.1267432 = computed_goal - goal)
n = 2999: computed_goal = 968.6512540 (-1.3487460)
n = 3000: computed_goal = 969.4294677 (-0.5705323)
n = 3001: computed_goal = 970.2078977 (+0.2078977) <= This is the optimal solution
n = 3002: computed_goal = 970.9865442 (+0.9865442)
Now, suppose l = 2987
, and h = 3015
. This makes n = 3001
. My goal
is to match value of 970. Note that n = 3001 is the optimal solution because |computed_goal - goal|
is the smallest in magnitude and it is the closest to the goal
However, my algorithm looks at it and goes computed_goal = 970.2078977 < goal = 970
= False, and sets new h
to be n - 1
= 3000, thus excluding the optimal solution from any future iterations. It so happens that my algorithm finally incorrectly settles in on n = 2999
, two steps away from optimal solution, here's a listing for reference:
l h n computed_goal goal
2987 3015 = 3001, 970.207897725643 < 970
2987 3000 = 2993, 963.9865161108 < 970
2994 3000 = 2997, 967.095475944351 < 970
2998 3000 = 2999, 968.651254012028 < 970
Misunderstanding
I understand that I am checking but then excluding the optimal solution and because I'm not backtracking back to it, it gets lost and excluded...
I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?
Attempt to fix. I should note as well that when I change my if statement to include end points, like so:
if (computed_goal < goal)
l = n;
else
h = n;
Algorithm keeps executing indefinitely in an infinite look with l = 3000
and h = 3001
. I am not sure how to go around that. I could change my while to while(l + 1 < h)
, which will break the infinite loop, but it looks off to me, and I am concerned that I may be introducing a different error into the code, and I don't have enough insight to say if I should be using the l + 1 < h
loop condition.
Function Sketch
Code
function matchGoal(int $goal = 970): int
{
$l = 0;
$h = 3900;
while ($l < $h)
{
$n = intdiv($h + $l, 2);
$x = 8321100 / $n;
$factor = 5.113233695 * pow($n / 3000, 2);
$computed_goal = (((-8.16965E-10 * $x + -3.15457E-06) * $x + 0.008437163) * $x + 207.9079475) * $factor - 0.076934117;
if ($computed_goal < $goal)
$l = $n + 1;
else
$h = $n - 1;
}
$relative_error = 100 * abs($computed_goal - $goal) / $goal;
return ($relative_error < 1) ? $n : 0;
}
// outputs 2999, when 3001 is optimal
print matchGoal() . PHP_EOL;