6

The title says it all. I'm using GCC 4.7.1 (bundled with CodeBlocks) and I faced a strange issue. Consider this:

int main() {
    unsigned char a = 0, b = 0, c = 0;
    scanf("%hhu", &a);
    printf("a = %hhu, b = %hhu, c = %hhu\n", a, b, c);
    scanf("%hhu", &b);
    printf("a = %hhu, b = %hhu, c = %hhu\n", a, b, c);
    scanf("%hhu", &c);
    printf("a = %hhu, b = %hhu, c = %hhu\n", a, b, c);
    return 0;
}

For inputs 1, 2 and 3, this outputs

a = 1, b = 0, c = 0
a = 0, b = 2, c = 0
a = 0, b = 0, c = 3

If I, however, declare a, b and c as global variables, it works as expected. Why is this happenning?

Thank you in advance

Other details:

I'm running Windows 8 64 bits. I also tried with -std=c99 and the problem persists.

Further research

Testing this code

void printArray(unsigned char *a, int n) {
    while(n--)
        printf("%hhu ", *(a++));
    printf("\n");
}

int main() {
    unsigned char array[8];
    memset(array, 255, 8);
    printArray(array, 8);
    scanf("%hhu", array);
    printArray(array, 8);
    return 0;
}

shows that scanf is interpreting "%hhu" as "%u". It is directly ignoring the "hh". The output of the code with input 1 is:

255 255 255 255 255 255 255 255
1 0 0 0 255 255 255 255
Daniel Castro
  • 1,290
  • 2
  • 11
  • 22
  • Is printing a `char` with a specifier for `unsigned char` UB? –  Apr 05 '13 at 03:11
  • @Armin char is unsigned by default, isn't it? – Daniel Castro Apr 05 '13 at 03:15
  • 1
    No, it depends on platform. I can't reproduce your results. –  Apr 05 '13 at 03:18
  • @Armin What other details would be useful? I'm using Windows.. I have not touched the default arguments for GCC in CodeBlocks. BTW, explicitly declaring the variables as unsigned char is not solving my problem :S – Daniel Castro Apr 05 '13 at 03:21
  • @DanielCastro .... char is by default signed char on most of the platform....you have to specifically declare it as unsigned char. – Kinjal Patel Apr 05 '13 at 04:12
  • @KinjalPatel The problem still happens with unsigned char. alk Yes, that question is talking about the same problem, but the answers don't explain why it's happening. In fact, the answers don't solve the problem. – Daniel Castro Apr 05 '13 at 17:36
  • 1
    Although the subject is similar to the proposed duplicate, the discussion here is better than the discussion in the duplicate — in particular, the answer here highlights that the MSVC runtime is C89 and not C99 so using a C99 notation doesn't work reliably. – Jonathan Leffler Apr 05 '13 at 23:05

1 Answers1

11

The important detail is that you're using Windows, and presumably an outdated or non-conforming C environment (compiler and standard library). MSVCRT only supports C89 (and even then, not entirely correctly); in particular, there was no "hh" modifier in C89, and it's probably interpreting "hh" the same as "h" (i.e. short).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Changing all %hhu to %hu did print exactly as OP posted. –  Apr 05 '13 at 03:25
  • 5
    This is more an issue of the standard library than the compiler. If OP's gcc is using MSVCRT (which will be the case if it's "mingw") then this is definitely the issue. If you want to do modern C on Windows, the only viable option at present is Cygwin. – R.. GitHub STOP HELPING ICE Apr 05 '13 at 03:26
  • I think the CodeBlocks package uses MinGW. I have no problem in switchng or updating my environment, I'm just curious about this behavior. Should I suppose it's a bug? – Daniel Castro Apr 05 '13 at 03:30
  • 2
    @DanielCastro: It's just that the Microsoft C standard library that provides your `scanf()` only supports C89, which doesn't include the `hh` modifier. – caf Apr 05 '13 at 03:35
  • @caf But then why using %hu still overwrites the other two variables? If it's writing a two byte integer (in my machine), I would expect that it overwrites at most one of the other variables. Also, that doesn't explain why declaring these variables as global works – Daniel Castro Apr 05 '13 at 03:38
  • If the program were doing `scanf("%hu", &a);`, which is presumably how MSVCRT is treating it, then it has invoked undefined behavior. Anything can happen and you're not entitled to any explanation of how or why it happened. By the way, if the language is C89, not C99, then "%hhu" does not have any defined meaning, and thus passing it as a conversion specifier to `scanf` results in UB for that reason. – R.. GitHub STOP HELPING ICE Apr 05 '13 at 03:41
  • Further, the `"%hu"` conversion specification expects a pointer to an `unsigned short`; when you pass it an `char *`, you can be reasonably confident that an extra byte will be modified. You might be better convinced if you created an array of char: `char buffer[4] = { 0xAA, 0xBB, 0xCC, 0xDD };`, and you pass `&buffer[2]` to the `scanf()`, and you print out all the bytes of `buffer` (before and) after the call to `scanf()`. You'll see that one of the other bytes is 0. – Jonathan Leffler Apr 05 '13 at 04:38
  • @R..: actually, MinGW-w64 has `scanf` that handles `%hhu` fine. –  Apr 05 '13 at 13:08
  • @JonathanLeffler But here it is overwriting the other two variables (it's writing at least 3 bytes in each case). It's probably interpreting "%hhu" as "%u". Try this code http://ideone.com/WSqbDr (not in ideone). OTOH, I'm thinking it's working fine when using global variables because they're not being allocated in adjacent memory positions (I'm guessing), Am I correct? The local variables, on the other hand, are located in the stack, so they occupy adjacent positions – Daniel Castro Apr 05 '13 at 18:14
  • I don't have access to Windows, so I can't test in an environment where this could be an issue. What I'd do is what I showed, possibly with a bigger buffer (`char buffer[16]`) initialized uniformly to some value (`memset(buffer, 0xAA, sizeof(buffer)`), and then call `scanf("%hhu", &buffer[8]);`. Then you can dump the entire buffer and see which bytes have been modified. That will tell you reasonably precisely what is going on for any single conversion. You then have to infer how the damage is occurring in your variables. Normally, global vs local doesn't affect layout much, but it depends. – Jonathan Leffler Apr 05 '13 at 18:39
  • @DanielCastro: The output you've shown doesn't necessarily mean that more than two bytes are being modified at a time. It could be that: the first `scanf()` writes 1 to `a` and 0 to the byte preceeding `a`; the second `scanf()` writes 2 to `b` and 0 to `a`; and the third `scanf()` writes `3` to `c` and `0` to `b`. It's also likely that when the variables are statically allocated (as globals are) then each is being aligned to a 4-byte boundary, which hides the problem. – caf Apr 05 '13 at 22:43
  • @caf Yes, I know that could happen with the output posted in the OP. I was not clear in that comment, but I was talking about the code I linked (ideone.com/WSqbDr). Anyway, I just tested with an array as Jonathan suggested, and it is working just like %u. Thanks for pointing out the alignment for the globals part – Daniel Castro Apr 06 '13 at 00:09
  • 2
    Just to clarify. I think I got some tests wrong before. Using %hu with unsigned char*, though UB, overwrites two bytes as one could expected. %hhu is interpreted as %u by MSVCRT. – Daniel Castro Apr 12 '13 at 02:40