what's the difference between mid=(beg+end)/2 and mid=beg+(end-beg)/2 in binary search?

Question

It is a problem from C++ primer fifth edition problem 3.26, I don't know the difference between them ? May be the second one can avoid overflow.

It even happened to the best: Jon Bentley's binary search in Programming Pearls contained that overflow bug. — TemplateRex, Jan 08 '14 at 16:27
Possible duplicate of [Why prefer start + (end - start) / 2 over (start + end) / 2 when calculating the middle of an array?](http://stackoverflow.com/questions/38688028/why-prefer-start-end-start-2-over-start-end-2-when-calculating-the) — Philipp Wendler, Aug 02 '16 at 14:47

Mike Seymour · Answer 1 · 2014-01-08T15:24:11.970

17

May be the second one can avoid overflow.

Exactly. There's no guarantee that beg+end is representable; but in the second case the intermediate values, as well as the expected result, are no larger than end, so there is no danger of overflow.

The second form can also be used for affine types like pointers and other random-access iterators, which can be subtracted to give a distance, but not added together.

edited Jan 08 '14 at 15:24

answered Jan 08 '14 at 14:58

Mike Seymour

249,747
28
448
644

Amusingly, the space of pointers (and random access iterators) to a range is an affine space: so affine combinations of random access iterators (like taking their average) is a logically valid thing to do. We just lack affine combination primitives in C++. – Yakk - Adam Nevraumont Jan 08 '14 at 15:13
@Yakk: Thanks, I was wracking my brains to remember the word "affine". It's been too long since I studied pure maths. – Mike Seymour Jan 08 '14 at 15:25
It is a fine thing to remember. – Yakk - Adam Nevraumont Jan 08 '14 at 15:38

score 1 · Answer 2 · answered Jan 08 '14 at 15:05

1

In general case the both expressions are invalid. For example the first expression is invalid because there is no such operation as + for pointers or iterators. The second expression is invalid in case when non-random access iterators are used. For example when bidirectional iterators are used.

So the correct construction in C++ will look the following way

mid = std::next( beg, std::distance( beg, end ) / 2 );

answered Jan 08 '14 at 15:05

Vlad from Moscow

301,070
26
186
335

When would you want to run a binary search on a non-random access iterator? – John Bartholomew Jan 08 '14 at 16:12
It is not only me. But the C++ Standard declares std::binary_search with forward iterators template bool binary_search(ForwardIterator first, ForwardIterator last, const T& value); – Vlad from Moscow Jan 08 '14 at 16:15
1

I suppose it makes sense if the comparison function is very expensive. Usually it doesn't make sense though, since you lose the O(log n) bound on running time. – John Bartholomew Jan 08 '14 at 16:20

score 1 · Answer 3 · answered Jan 08 '14 at 17:09

If we consider the two lines in a more generic setting, not related to binary search, the following observations can be made:

You are correct that the problem the second form tries to avoid is overflow, attempting to represent a number that is larger than the maximum representable number.

There is no restriction on how large the individual numbers beg and end are, so potentially they can both be larger than half of the maximum representable number. Adding them means that the intermediate result (beg+end) can overflow.

The second solution seems to eliminate the risk of overflowing, but introduces another one. If the values are signed values, their difference can again overflow (or underflow, depending on their signs). Unsigned values have no problem.

There is another solution which you didn't post:

mid = beg/2 + end/2

This solves every problem with overflow and underflow, but introduces a new problem, of precision loss. If working with integer values, division by 2 can give a result off by 0.5, adding these together, means that mid can be off by 1:

mid = 3/2 + 5/2; // mid is 3, instead of the 4 expected

Working with floating point values has other precision problems.

Getting back to the problem at hand, the binary search, it's easy to see that beg and end are unsigned values, so the second solution will always give the correct result.

flintjr2 · Answer 4 · 2017-06-25T23:21:21.950

1

The answer is in the book:

"Because the iterator returned from end does not denote an element, it may not be incremented or dereferenced."

Graphically it makes sense as an asymmetric range, [begin, off-the-end) or half-open range.

From Accelerated C++, page 28, Koenig.

edited Jun 25 '17 at 23:21

answered Jun 25 '17 at 23:12

flintjr2

21
3

what's the difference between mid=(beg+end)/2 and mid=beg+(end-beg)/2 in binary search?

4 Answers4

Linked