Is it a best practice to use unsigned data types to enforce non-negative and/or valid values?

Question

Recently, during a refactoring session, I was looking over some code I wrote and noticed several things:

I had functions that used unsigned char to enforce values in the interval [0-255].
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).

The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using int or long with appropriate non-notifying clamping?

A note on the use of "appropriate": There are cases where I use signed data types and throw notifying exceptions when the values go out of range but these are reserved for divde by zero and constructors.

@JesseGood: Oh joy. Can I expect a style war? (or speedy deletion?) — Casey, May 12 '12 at 21:33
I say yes. The point of types is to have the compiler enforce constraints, and if your constraint is that an integer not be negative, don't use a signed type. *That said*, C++'s type system is very weak and doesn't have all the power and safety you need, so someone will chime in with an example of unsigned integers behaving unintuitively. That's fine, and it's up to you to decide if you want to learn those corner-cases and stick to unsigned types, or just use signed types and check for negative yourself. I prefer the former, on the basis that you're going to need to learn the language anyway. — GManNickG, May 12 '12 at 21:39
@Casey: Well, for example Google C++ style guide says: `In particular, do not use unsigned types to say a number will never be negative.` (although some people criticize the style guide), but I would be on the side that says yes. — Jesse Good, May 12 '12 at 21:49
@JesseGood: You would be right that some people criticize Google's guide. If you define "some" to be "Everyone who writes modern C++". It's recommendations are worse than junk. — Puppy, May 12 '12 at 21:58

score 6 · Answer 1 · answered May 12 '12 at 22:52

In C and C++, signed and unsigned integer types have certain specific characteristics.

Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions).

Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around.

Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious.

Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this:

for (unsigned int u = 10; u >= 0; u --) {
    // ...
}

is an infinite loop.

One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.)

Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types.

(Yet another is to define your own types with exactly the behavior you want, but that has costs.)

As John Sallay's answer says, consistency is probably more important than which particular approach you take.

I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.

score 3 · Answer 2 · answered May 12 '12 at 21:41

The biggest benefit from unsigned is that it documents your code that the values are always positive.

It doesn't really buy you any safety as going outside the range of an unsigned is usually unintentional and can cause just as much frustration as if it were signed.

I had functions that used unsigned char to enforce values in the interval [0-255].

If you're relying on the wraparound then use uint8_t as unsigned char could possibly be more than 8 bits.

Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.

Is this really the correct behavior?

Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).

Where did you get an upper bound of 4,000,000,000 from? Your bound is between INT_MAX and INT_MIN (you can also use std::numeric_limits. In C++11 you can use decltype to specify the type which you can wrap into a template/macro:

decltype(4000000000) x; // x can hold at least 4000000000

"Where did you get an upper bound of 4,000,000,000?" Binary addition of 32 bits set to all ones. It's really 2^32 = 4,294,967,295 but I always round to the nearest left-most significant digit for safety. — Casey, May 12 '12 at 21:55
@Casey Which is an implementation defined number. It's a magic number if you intend to write portable code. — Pubby, May 12 '12 at 22:05

John Sallay · Accepted Answer · 2012-06-16T20:42:27.023

I would probably argue that consistency is most important. If you pick one way and do it right then it will be easy for someone else to understand what you are doing at a later point in time. On the note of doing it right, there are several issues to think about.

First, it is common when checking if an integer variable n is in a valid range, say 0 to N to write:

if ( n > 0 && n <= N ) ...

This comparison only makes sense if n is signed. If n is unsigned then it will never be less than 0 since negative values will wrap around. You could rewrite the above if as just:

if ( n <= N ) ...

If someone isn't used to seeing this, they might be confused and think you did it wrong.

Second, I would keep in mind that there is no guarantee of type size for integers in c++. Thus, if you want something to be bounded by 255, an unsigned char may not do the trick. If the variable has a specific meaning then it may be valuable to to a typedef to show that. For example, size_t is a value as wide as a memory address. Which means that you can use it with arrays and not have to worry about being on 32 or 64 bit machines. I try to use such typedefs whenever possible because they clearly communicate why I am using the type. (size_t because I'm accessing an array.)

Third, is back on the issue of wrap around. What do you want to happen with an invalid number. In the case of an unsigned char, if you use the type to bound the data, then you won't be able to check if a value over 255 was entered. That may or may not be a problem.

score 0 · Answer 4 · answered May 12 '12 at 21:38

0

This is a subjective issue but I'll give you my take.

Personally if there isn't type designated to the operation I am trying to carray out, IE std::size_t for sizes and index, uintXX_t for specific bit depths etc... then I default to unsigned unless I need to use negative values.

So it isn't a case of using it to enforce positive values, but rather I have to select signed feature explicitly.

As well as this I if you are worried about boundaries then you need to do your own bounds checking to ensure that you aren't overflowing.

But I said, more often then not your datatype will be decided by your context with the return type of the functions you apply it to.

answered May 12 '12 at 21:38

111111

15,686
6
47
62

which header(s) are the uintXX_t (and the like) defined? I've always wanted to use them but I kept getting "so-and-so is not defined" errors. – Casey May 13 '12 at 02:16
Disregard previous. They're in the or headers. VS2010 (actually, Visual Studio in general) does not like programmers to use them (the headers, not the types). It causes the compiler to explode. – Casey May 13 '12 at 02:26
@Casey: They were added to C++ in the C++11 standard in the header cstdint. If you are back on C++03 still, then you may not have access to them. However, if that is the case, they are also in Boost. – David Stone Nov 18 '12 at 17:17
I've since moved to VS2012, VS2013 and VS2015 and the `uintXX_t` and similar types have been added back in without error. – Casey Aug 12 '15 at 00:13

Is it a best practice to use unsigned data types to enforce non-negative and/or valid values?

4 Answers4