Why don't compilers make unsigned vs signed comparison safe?

Question

As we know, code like this will generate an warning:

for (int i = 0; i < v.size(); ++i)

The solution is something like auto i = 0u;, decltype(v.size()) or std::vector<int>::size_type but pretend that we're forced to have both a signed and unsigned value. The compiler will automatically cast the int to be an unsigned int (the actual type doesn't matter). Using an explicit cast, static_cast<unsigned int>(i) makes the warning go away, but this is bad because it only did the same thing the compiler did and silenced an important warning!

The better solution is:

if ((i < 0) || (static_cast<unsigned int>(i) < v.size()))

Understandably, C is "closer to the metal" and as a consequence more unsafe. But in C++, there's no excuse for this. As C++ and C diverge (as they have been doing for many years), hundreds of improvements to C++ have increased safety. I highly doubt a change like this would hurt performance either.

Is there a reason why compilers don't do this automatically?

N.B: this DOES happen in the real world. See Vulnerability Note VU#159523:

This vulnerability in Adobe Flash arises because Flash passes a signed integer to calloc(). An attacker has control over this integer and can send negative numbers. Because calloc() takes size_t, which is unsigned, the negative number is converted to a very large number, which is generally too big to allocate, and as a result calloc() returns NULL causing the vulnerability to exist.

Often you can write code so that this issue doesn't occur. Use iterators, range-based for loops, standard algorithms or, if you have to count, use `size_t`. — Neil Kirk, Oct 31 '15 at 00:21
"I highly doubt a change like this would hurt performance" - if it results in more instructions, it will hurt performance. Even if the execution of those instructions overlap, cache is still finite. Conditions were *much* worse when these rules were first written in stone. — Mark Ransom, Oct 31 '15 at 00:25
Re. the vulnerability: it sounds like the real problem is that Flash doesn't check the result of calloc? — M.M, Oct 31 '15 at 01:14
[Why C compiler cannot do signed/unsigned comparisons in an intuitive way](https://stackoverflow.com/q/14485655/995714) — phuclv, Sep 18 '19 at 04:11

Chuck Walbourn · Answer 1 · 2015-10-31T00:22:27.127

1

An important goal of C++ is compatibility, and there's a ton of code that would not compile if signed/unsigned mixing was a fatal error. See Stroustrup’s C++ Design Goals in 1986.

Your proposal also adds a comparison that is not present in the source.

Arguably with C++11 this case is already made more safe if you used ranged-for and auto:

for (auto i : v)

edited Oct 31 '15 at 00:22

answered Oct 31 '15 at 00:18

Chuck Walbourn

38,259
2
58
81

Does the code I proposed make a fatal error? I'm just saying replace the current behavior with the safer behavior, not make the warning an error. – user5508973 Oct 31 '15 at 00:19
Your code is already not safe if you are building with 64-bit where ``size_t`` is 64-bit and ``unsigned int`` is 32-bit... say Windows x64. – Chuck Walbourn Oct 31 '15 at 00:20
1

Ranged-based for loops is syntactic sugar for a very specific form of loop. It doesn't make signed and unsigned comparisons safer in general. – user5508973 Oct 31 '15 at 00:21
Good point, I added a note that the actual type doesn't matter. – user5508973 Oct 31 '15 at 00:22
@user5508973 No it doesn't but it allows you to avoid those comparisons in many cases. – Neil Kirk Oct 31 '15 at 00:23
@ChuckWalbourn if the compiler were doing the casting, it would cast to the promoted integer size no matter how big `int` is, so that's not an issue. – Mark Ransom Oct 31 '15 at 00:28

Cheers and hth. - Alf · Answer 2 · 2015-10-31T00:49:02.333

The current C++ rules for implicit conversions make unsigned integer types dangerous for use as number types. Canonical example, that string("hello").size() < -5" is guaranteed, which is silly – but not just silly. The time wasted tracking down subtle bugs involving such conversions, not to mention the time wasted trying to select perfectly matching types, says that it's much more serious than a silly behavior in a made-up example; it's about as serious as it gets in programming.

A bad solution is to accommodate sloppy programmers by adding extra comparisons where these programmers directly compare signed to unsigned. It's bad because C++ is founded on the idea of catering to competent programmers who know what they're doing and want as little as possible between their code and the resulting machine code. Comparing signed to unsigned has simple well-defined effects in current C++, just not the effects that e.g. a Java programmer may expect.

A good solution is instead for the programmer to improve, to express directly what he or she wants, and not rely on magic invisible fix-ups.

For example, one good way to express the intent is to cast the result of size operations to a signed type, e.g. ptrdiff_t, or when that full generality isn't needed, just plain int.

And one good way to support that is to define an n_items function like this:

using Size = ptrdiff_t;

template< class Container >
auto n_items( Container& c )
    -> Size
{ return c.size(); }

At the cost of not supporting e.g. std::list one can make it work for raw arrays without a specialization, by using end(c) - begin(c). But I prefer to have a separate ⁽¹⁾specialization. Like this:

template< class Item, Size n >
auto n_items( Item (&)[n] )
    -> Size
{ return n; }

Disclaimer: off the cuff code, not touched by compiler's hands.

^{¹⁾ Well, technically it's an overload, not a specialization. But I follow the terminological lead of the former secretary of the standardization committee Pete Becker. I.e., let's use plain descriptive language where we can, and only resort to formal terms when that's necessary.}

Current C++ rules make signed integers unsafe for any purpose, due to the ridiculous undefined behavior rules. I've been adding `u` to as many places as possible on my code in order to avoid these problems. — Myria, Oct 31 '15 at 00:53
@Myria: Using unsigned everywhere subtitutes a number of grave problems for the very marginal problems of UB with signed overflow. In practice that UB is only a problem with respect to optimization for certain compilers (I use plural but I have first-hand knowledge of only one). A good solution is then to use the appropriate compiler options to make that compiler behave in more practical way, where (if at all) necessary. — Cheers and hth. - Alf, Oct 31 '15 at 00:58

Why don't compilers make unsigned vs signed comparison safe?

2 Answers2