Am I nuts or did I find an issue with C's design?

Question

Ok, so fgetc() returns an unsigned char cast to int, and EOF at... EOF. What if you're trying to read in a config file to a char array, and your implementation's char is signed? The standard for C99 says that not only is the result implementation-defined to assign an unrepresentable value to a signed variable, but that the implementation can alternatively choose to raise a signal instead!

6.3.1.3 Signed and unsigned integers

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Constructs like this are very common:

int c, i = 0;
char arr[1024];

for (; (c = getc(Descriptor)) != EOF && i < sizeof arr - 1; ++i)
{
        arr[i] = (char)c;
}
arr[i] = '\0';

It's also implementation-defined to perform the cast if char is signed and the value in c is higher than can be represented.

I find it extremely unlikely that I have found a problem that thousands of programmers have missed over the years, especially with the ubiquitousness of the construct above.

It seems that there is a possibility that non-text code read through this means could cause issues, since some of the bytes could have values that don't fit in a signed char. I've never seen a modified version of the above construct that actually addresses this put to use.

Have I actually found a flaw relating to the C standard, or noticed something thousands of other programmers have not and failed to implement in their error checking?

Why would char be signed? There's no use for a negative character value, and ASCII (which is the character set usually assumed for chars) spans only the values 0-127. — keshlam, Jan 14 '14 at 23:33
(char)c just takes the last 8 bits for the int, not sure where the problem arises, even if the chars are signed, there are no negative chars. you don't even need the type conversion are[i] = c; is all you need. — tesseract, Jan 14 '14 at 23:33
The purpose of the C-standard is not to tie everyone's hands in hardware implementation ... thus you get ambiguities where things that are not well defined will be implementation defined. — Jason, Jan 14 '14 at 23:33
It just is sometimes. x86, for example has char signed by default. PowerPC is unsigned. — Subsentient, Jan 14 '14 at 23:34
@tesseract: If someone accidentally inserted some binary into a text file, it potentially could be used to cause a signal to be raised. — Subsentient, Jan 14 '14 at 23:51

Am I nuts or did I find an issue with C's design?

0 Answers0