Why C compiler cannot do signed/unsigned comparisons in an intuitive way

Question

By "intuitive" I mean given

int a = -1;
unsigned int b = 3;

expression (a < b) should evaluate to 1.

There is a number of questions on Stackoverflow already asking why in this or that particular case C compiler complains about signed/unsigned comparison. The answers boil down to integer conversion rules and such. Yet there does not seem to be a rationale behind why compiler has to be so exceptionally dumb when comparing singed and unsigned integers. Using declarations above, why expression like

(a < b)

is not automatically substituted by

(a < 0 || (unsigned int)a < b)

if there is no single machine instruction to do it properly?

Now, there have been some comments for previous questions in the vein of "if you have to mix signed and unsigned integers, there is something wrong with your program". I would not buy that since libc itself makes it impossible to live in a signed-only or unsigned-only world (e.g. example sprintf() family of functions returns int as the number of bytes written, send() returns ssize_t and so on).

I also don't think I can buy an idea expressed in comments below that implicit conversion of signed integer to unsigned in comparison (the (d - '0' < 10U) "idiom") bestows some additional powers on C programmer compared to explicit cast (((unsigned int)(d - '0') < 10U)). But sure enough it opens wide opportunities to screw up.

And yes, I'm happy that compiler warns me that it cannot do it (unfortunately only if I ask it explicitly). The question is - why can't it? Usually there are good reasons behind standard's rules, so I'm wondering if there are any here?

I want to upvote, but at the same time I want to close this... — David G, Jan 23 '13 at 17:27
libc is one of the worst libraries ever designed and written, so. — Cat Plus Plus, Jan 23 '13 at 17:28
It's simple, the compiler doesn't *have to* complain, it just thinks it's doing you a favor. — Luchian Grigore, Jan 23 '13 at 17:29
Also, signed numbers might not be in a system where comparison you're proposing has semantics you want. — Cat Plus Plus, Jan 23 '13 at 17:29
Like Luchian said, the compiler doesn't even care about your silly little program. Write assembly code yourself if it'll make you happy. — StoryTeller - Unslander Monica, Jan 23 '13 at 17:31
Having spent a day with a colleague trying to understand why the code was not working as expected, when it turns out that the math which was done with a mix of signed and unsigned was the fault, I wish we'd had that sort of warning. Unfortunately, it cuts both ways... — Mats Petersson, Jan 23 '13 at 17:31
This remindes me of the old saw about how C gives all the flexabily of ASM with all the legibility of ASM — tletnes, Jan 23 '13 at 17:31
@AlexChamberlain, I did. And I was so caught up on the conversion that I overreacted. — StoryTeller - Unslander Monica, Jan 23 '13 at 17:33
Bottom line seems to be, "Proving correctness is harder than you think." — John, Jan 23 '13 at 17:35
The main reason why it is not done is that this is how it worked in C, and if you change the semantics of those comparisons you will break existing C programs. C programmers were just meant to be very well aware of what they were doing; after all, they were working on a language which is pretty close to the hardware level. Now, I believe a C++ compiler issues a warning because it believes you are not likely to be one of those well-aware C programmers. This said, backwards compatibility is probably the main reason why C++ is not what we (or at least I) would like it to be. — Andy Prowl, Jan 23 '13 at 17:35

score 6 · Answer 1 · edited May 23 '17 at 11:49

6

The automatic replacement cannot be made because that's different from C semantics, and would horribly break programs that use the conversion correctly. For example:

if (d-'0'<10U)  // false if d is not a digit

would become true for ASCII space and many other characters with your proposed replacement.

By the way, I believe this question is partly a duplicate of:

Would it break the language or existing code if we'd add safe signed/unsigned compares to C/C++?

edited May 23 '17 at 11:49

Community

1
1

answered Jan 23 '13 at 17:33

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Thanks, it is partly duplicate, and I understand that NOW it would break many programs. The question is why this was in the standard from the very beginning? – ayurchen Jan 23 '13 at 19:28
That's hard to answer; it's more of a history question. I suspect it's because most of the design of C took the path of simplicity. Also, the way C does it is more powerful: you can express stronger things (like the above conditional) with a single comparison operator, and if you need your version, it's easy to add it explicitly. On the other hand, if your version were added implicitly, you'd have to write the above as `(d>='0' && d<='9') and rely on the optimizer to convert this to a single unsigned range comparison (or perhaps just add some casts to do it). – R.. GitHub STOP HELPING ICE Jan 23 '13 at 19:36
And could you please elaborate a bit on your code example. I'm not sure I understand its significance. Why it should be false for anything but digits? I see that with current conversion rules it is a cool hack, but is it the sole reason to **require** signed/unsigned comparisons to work completely non-intuitively? – ayurchen Jan 23 '13 at 19:37
1

Indeed, about that idiom: `((unsigned int)(d - '0') < 10U)` would do it in one operation as well. – ayurchen Jan 23 '13 at 19:55

score 1 · Answer 2 · answered Jan 23 '13 at 17:44

In this case I'm sure it once again falls back to C (and C++) not making you pay for features you don't need. If the default behavior is satisfactory you simply write the obvious code. If it's not sufficient for your needs, then you write the two part expression yourself, only then paying extra price. If the compiler always did what you suggested you might end up paying a code performance penalty even though the actual range of values used in your program could never cause any problems.

Some compilers then provide you a convenience/correctless warning to let you know you've entered the area where different signed-ness values are being compared.

Well, the thing is, that when I'm completely sure that my signed integer is guaranteed to be non-negative here, I can always cast it to unsigned type if I want an optimization. However requiring me to write this 2-part comparison manually means inviting me to make a mistake. But, I guess, back then convenience of programmer was not a concern ;) — ayurchen, Jan 23 '13 at 19:50

score 1 · Answer 3 · answered Jan 23 '13 at 18:04

The rules for the usual arithmetic conversions apply to the operands of almost all binary operators. They are a unified framework of dealing with a mix of integral types of different size and signedness in operations that (at least at the machine level) require equal types. The rules were designed to make implementation as simple and efficienmt as possible on common computer architectures. Especially conversion between signed and unsigned int is a generally a no-op on two's complement architectures and comparison remains a single instruction - either signed or unsigned.

An exception like the one you suggest would have been possible for the very special case of comparisons between signed and unsigned types. The cost would have been an irregularity in the rules for dealing with expression operands and a complicated implementation - a signed

The designers of C chose not to do so. Changing that decision would break lots of existing code for limited benefit - you'll still encounter common arithmetic conversions with other operators, so you must be aware of them.

Compilers warn (or can be made to warn) about conversions that may have surprising results, so that you are not surprised by an unintended mix of integers of differing signedness or size. Use casts to express exactly how you want this to be evaluated - that gets rid of the warnings and helps the next reader of your code.

I think the real issue is that unsigned values are used both to represent numbers and members of abstract algebraic rings (informally, wrapping ranges of integers). Ring members aren't really numbers, but algebraically-valid conversions exist from numbers to ring members. This justifies the principle that "unsignedness" takes precedence over signedness. If there were different types for numbers and abstract algebraic ring members, then compilers could define operators so that when mixed-type operations were allowed they would behave like numbers when they should and likewise group members. — supercat, Jan 25 '14 at 04:23

score 0 · Answer 4 · answered Jan 23 '13 at 17:32

If I'm not mistaken, it's only a warning, and can thereby be disregarded.

The problem is the range of the integer variants.

While a signed integer can hold values from -2147483648 to 2147483648 (+- one or two), an unsigned integer can range from 0 to 4294967296.

That means, if you compare a signed integer to an unsigned integer, it may lead to false results altogether, because internally the sign is represented by the MSB of the integer.

An example:

You have the number -1 and the number 3,000,000,000. Which one is larger? Clearly the second one you may say...but for the computer, the -1 is actually larger, because 'as unsigned' (which would be required to evaluate the large one correctly), -1 is represented as the maximum number. (4294967296).

On the contrary, if both are treated as signed, the large number will be some rather high negative number, because it's beyond the scope of a signed integer.

That's why the compiler outputs this warning. While the actual error case is rather rare, it still MAY happen. And that's just what the compiler warns you of...that something unexpected may happen when comparing two differently signed integers.

Why C compiler cannot do signed/unsigned comparisons in an intuitive way

4 Answers4

Linked