Why is sscanf behaving like this when converting hex strings to number?

Question

I have written a piece of code that I am using to research the behavior of different libraries and functions. And doing so, I stumbled upon some strange behavior with sscanf.

I have a piece of code that reads an input into a buffer, then tries to put that value into a numeric variable.

When I call sscanf from main using the input buffer, and the format specifier %x yields a garbage value if the input string is shorter than the buffer. Let's say I enter 0xff, I get an arbitrarily large random number every time. But when I pass that buffer to a function, all calls to scanf result in 255 (0xff) like I expect, regardless of type and format specifier mismatch.

My question is, why does this happen in the function main but not in the function test?

This is the code:

#include <stdio.h>

int test(char *buf){
    unsigned short num;
    unsigned int num2;
    unsigned long long num3;
    sscanf(buf, "%x", &num);
    sscanf(buf, "%x", &num2);
    sscanf(buf, "%x", &num3);
    printf("%x", num);
    printf("%x", num2);
    printf("%x", num3);
    return 0;
}

void main(){
    char buf[16];
    unsigned long long num;
    printf("%s","Please enter the magic number:");
    fgets(buf, sizeof(buf),stdin);
    sscanf(buf, "%x", &num);
    printf("%x\n", num);
    test(&buf);

}

I expect the behavior to be cohesive; all calls should fail, or all calls should succeed, but this is not the case.

I have tried to read the documentation and do experiments with different types, format specifiers, and so on. This behavior is present across all numeric types.

I have tried compiling on different platforms; gcc and Linux behave the same, as do Windows and msvc.

I also disassembled the binary to see if the call to sscanf differs between main() and test(), but that assembly is identical. It loads the pointer to the buffer into a register and pushes that register onto the stack, and calls sscanf.

Now just to be clear: This happens consistently, and num in main is never equal to num, num2 or num3 in test, but num, num2 and num3 are always equal to each other. I would expect this to cause undefined behavior and not be consistent. Output when run - every time

./main
Please enter the magic number: 0xff
0xaf23af23423 <--- different every time
0xff  <--- never different
0xff  <--- never different
0xff  <--- never different

The current reasoning I have is in one instance sscanf is interpreting more bytes than in the other. It seems to keep evaluating the entire buffer, getting impacted by residual data in memory.

I know I can make it behave correctly by either filling the buffer, with that last byte being a new line or using the correct format specifier to match the pointer type. "%llx" for main in this case. So that is not what I am wondering; I have made that error on purpose.

I am wondering why using the wrong format specifier works in one case but not in the other consistently when the code runs.

How would you know whether or not the calls failed? You never looked at the result codes. — user4581301, Nov 08 '22 at 20:12
Do not tag both C and C++ except for questions about differences or interactions between the two languages. Since the code shown use a C header, I am deleting the C++ tag. If you want a C++ answer, delete the C tag and add C++. — Eric Postpischil, Nov 08 '22 at 20:12
You should be getting some errors given that `test` is not declared before `main`. — Chris, Nov 08 '22 at 20:12
Wrong format specifier. The `sscanf(buf, "%x", &num);` should be `sscanf(buf, "%llx", &num);`. Not just that, but as `if(sscanf(buf, "%llx", &num) == 1) test(&buf);` — Weather Vane, Nov 08 '22 at 20:13
Undefined behavior = anything can happen, including appearing to work consistently. — Ry-, Nov 08 '22 at 20:15
I know it fails because the number that ends up in num in main is wrong, while the number that ends up in all three num variables in test are correct when using the same input. The ordering was a copy paste issue, the code compiles and runs. — Espen, Nov 08 '22 at 20:15
[`s`]`scanf` input directives convey not just the expected form of the input, but also the type of the variable, if any, in which that input is to be stored. `%x` says that the type of the variable is *exactly* `unsigned int`. If you pass a pointer to a variable of any other type then the behavior is undefined. — John Bollinger, Nov 08 '22 at 20:16
But it can't be totally undefined, because the call to test works every time, and the call in main works none of the time. — Espen, Nov 08 '22 at 20:16
That is one feature of undefined behaviour: working some of the time, or apparently in all the tests you make. — Weather Vane, Nov 08 '22 at 20:17
The question is why does calling it wrong work in one case, but not the other. — Espen, Nov 08 '22 at 20:17
Undefined simply means that the Standard does not say what will happen. What happens may be predictable, reproducible, and utterly logical. But it could also drop a nuke on your cat. — user4581301, Nov 08 '22 at 20:17
@Espen, "undefined" does not mean "random". It means that the language spec does not define the result, and therefore you cannot predict the result based on the code and input alone. Nor is it safe to assume that the result will be consistent, but neither is it safe to assume that it will be *in*consistent. — John Bollinger, Nov 08 '22 at 20:18
@Espen I understand your confusion but take a moment to wrap your head around what *all* these comments have been telling you repeatedly — tijko, Nov 08 '22 at 20:20
I am not asking about the definition of undefined behavior or why the language spec doesn't define it, I am asking what in the implementation is causing it appear to behave in one case, but not the other - consistently. — Espen, Nov 08 '22 at 20:20
@Espen that is exactly what they've been explaining. Hence why I recommended to take a few moments to re-read and *think* about it. I'm not being harsh because we've all been there but there are only so many times people are willing to repeat something. — tijko, Nov 08 '22 at 20:22
Note that C isn't the most helpful of languages. It prizes speed and efficiency over just about everything else, including issuing meaningful diagnostics for runtime errors. The checking required to detect what's gone wrong in order to issue a diagnostic has a cost, and that cost would have to be paid by everyone. In general if something has a cost, C doesn't do it unless you specifically ask for it or it is expressly called out in the documentation. — user4581301, Nov 08 '22 at 20:23
@Espen, we cannot possibly answer a "what in the implementation?" question when you haven't specified an implementation. Generally speaking, however, such questions are rarely answered definitively here. That's in part because we're not usually inclined to go trawling through the relevant code to figure it out, if that's even possible, and in part because it's not very interesting. — John Bollinger, Nov 08 '22 at 20:23
Unless you need to *prove* that you really did solve a tricky bug (in a live application) it isn't worth pursuing. Suppose you jump a red light? Is it worth analysing why you escaped being squashed? — Weather Vane, Nov 08 '22 at 20:24
@Espen to be more explicit without posting you the source code to the gnu libc function..which in itself would be very tedious but its because the code does not have any guarantees on what will be in the memory when you call a print statement or whatever else you plan to do with it. — tijko, Nov 08 '22 at 20:24
Note: Some compilers will warn you of mistakes like this if coaxed: https://godbolt.org/z/d1MKjKoax Turn the warnings on and turn them up loud, because as you've seen, figuring what went wrong based on the running of the program can be tricky. Also never ignore the return codes. C doesn't throw exceptions or the like, so if you don't look at and handle the result code, you're in for some debugging. — user4581301, Nov 08 '22 at 20:32
I am not asking what the correct format specifier is, I have used the wrong one on purpose. I am curiously wondering why it doesn't trigger undefined behavior in the case of the test function. — Espen, Nov 08 '22 at 20:33
It did trigger undefined behaviour. But you got what you expected, so you didn't realize it. Nasty stuff, UB. — user4581301, Nov 08 '22 at 20:34
"I expect the behavior to be cohesive" - that's the main problem here. Undefined behaviour means anything can happen ; expect the unexpected . — M.M, Nov 08 '22 at 20:45
Does this answer your question? [Reading long int using scanf](https://stackoverflow.com/questions/2852390/reading-long-int-using-scanf) — autistic, Nov 08 '22 at 21:20
I suppose we're expected to believe you know enough about the subject to know which tags are best, but not enough to tell that this question (and the answers) is littered across the internet as one of the most frequently asked questions in C... As engineers of the digital world, I feel it's our duty to explain to people that it's far more computationally feasible to do prior research than it is to later deduplicate the thousands of identical questions on the internet. Let's strive to make Q&A [a 1NF store](https://en.m.wikipedia.org/wiki/First_normal_form)... — autistic, Nov 08 '22 at 21:27
Oh, I see, you didn't add _those tags_... Well, I'll leave the remark about 1NF anyway, because this FAQ dates to before StackOverflow existed, and there's no way you'll ask any of the other FAQs if you just anticipate this and read them all at once. — autistic, Nov 08 '22 at 21:31
Again to clarify, I didn't ask what the correct specifier was, nor did I ask why I got a compiler warning. I know that this is undefined behavior. The code I wrote was written wrong on purpose to observe the undefined behavior. I expected it to be nonsensical, but it worked in 3 of 4 cases every time I ran the program. I wondered why it gave the illusion of working consistently when doing a function call rather than in main. The answer I was looking for is the last paragraph of @Erik Postpischil's answer. None of the other answers I found on StackOverflow, or other places gave that answer. — Espen, Nov 08 '22 at 21:58
You're printing the value of `num` before you call sscanf, so you're just getting the value in uninitialized memory that happens to be there... — Chris Dodd, Nov 09 '22 at 00:59
Yeah, that was a mistake in transfering the code to SO. It has the print call after sscanf — Espen, Nov 09 '22 at 14:17

score 0 · Accepted Answer · edited Nov 08 '22 at 21:10

0

sscanf with %x should be used only with the address of an unsigned int. When an address of another object is passed, the behavior is not defined by the C standard.

With a pointer to a wider object, the additional bytes in the object may hold other values (possibly leftover from when the startup code prepared the process and called main). With a pointer to a narrower object, sscanf may write bytes outside of the object. With compiler optimization, a variety of additional behaviors are possible. These various possibilities may manifest as large numbers, corruption in data, program crashes, or other behaviors.

Additionally, printing with incorrect conversion specifiers is not defined by the C standard, and can cause errors in printf attempting to process the arguments passed to it.

Use %hx to scan into an unsigned short. Use %lx to scan into an unsigned long. Use %llx to scan into an unsigned long long. Also use those conversion specifiers when printing their corresponding types.

My question is, why does this happen in the function main but not in the function test?

One possibility is the startup code used a little stack space while setting up the process, and this left some non-zero data in the bytes that were later used for num in main. The bytes lower on the stack held zero values, and these bytes were later used for num3 in test.

edited Nov 08 '22 at 21:10

Steve Summit

45,437
7
70
103

answered Nov 08 '22 at 20:18

Eric Postpischil

195,579
13
168
312

@Espen: So? That is entirely consistent with what I wrote. The `num` in `main` has leftover values in its bytes, and those values were not all changed by `sscanf`, so it appears to have a larger value. The narrower `num` and `num2` in `test` were completely overwritten by `sscanf`, so they have the values put there by `sscanf`. The wider `num3` in `test` has zeros in its bytes, so it does not appear to have any larger value. – Eric Postpischil Nov 08 '22 at 20:34
You are correct! That is what I was looking for. The call in main clears out the garbage, making test behave. Thanks! – Espen Nov 08 '22 at 20:38
@Espen: `main` does not clear the “garbage.” The stack space used for `num3` in `test` simply contained zeros, likely because zero-initialized pages were allocated for the process and the startup code did not alter those bytes. – Eric Postpischil Nov 08 '22 at 20:43
I see, thank you for clearing that up. This was truly what I was looking for. – Espen Nov 08 '22 at 21:25
@Espen I've read over your comments in the other answer about ESL which makes much more sense now. You came off originally as very ambiguous and I'm not sure you got it but I made this comment above **the code does not have any guarantees on what will be in the memory when you call a print statement or whatever else you plan to do with it** thats what I meant...as an exercise I enjoy learning about this but in general do not ask the community these things as they are generally not received well unless you have a purpose. If you wanted to specifically know what was changing the memory and how – tijko Nov 09 '22 at 02:13

Vlad from Moscow · Answer 2 · 2022-11-08T20:35:06.683

0

The argument expression in this call

test(&buf);

has the type char ( * )[16] but the function expects an argument of the type char *

int test(char *buf){

There is no implicit conversion between these pointer types.

You need to call the function like

test( buf );

Also it seems there is a typo

printf("%s","Please enter the magic number:");
printf("%x\n", num);

The variable num is not initialized.

In this call

unsigned long long num;
//...
sscanf(buf, "%x", &num);

you are using the third argument of the type unsigned long long int * but the conversion specification "%x" expects an argument of the type unsigned int *. So the call has undefined behavior.

You need to write

sscanf(buf, "%llx", &num);

The same problem exists for the used variable num that has the type unsigned short

unsigned short num;
//...
sscanf(buf, "%x", &num);

You have to write

sscanf(buf, "%hx", &num);

The same length modifiers you need to use in calls of printf

printf("%hx", num);
printf("%x", num2);
printf("%llx", num3);

Here is a demonstration program.

#include <stdio.h>

int main( void )
{
    char buf[] = "0xff\n";
    unsigned short num;
    unsigned int num2;
    unsigned long long num3;

    sscanf( buf, "%hx", &num );
    sscanf( buf, "%x", &num2 );
    sscanf( buf, "%llx", &num3 );

    printf( "%hx\n", num );
    printf( "%x\n", num2 );
    printf( "%llx\n", num3 );
}

The program output is

ff
ff
ff

edited Nov 08 '22 at 20:35

answered Nov 08 '22 at 20:23

Vlad from Moscow

301,070
26
186
335

I am using the wrong specifiers on purpose, I am wondering why it doesn't trigger undefined behavior in the test function, but does in the main function. – Espen Nov 08 '22 at 20:34
@Espen In the both cases you have undefined behavior. – Vlad from Moscow Nov 08 '22 at 20:35
@Espen Also it seems there is a typo in main printf("%s","Please enter the magic number:"); printf("%x\n", num); The variable num is not initialized. – Vlad from Moscow Nov 08 '22 at 20:37
Ah! thanks, never type fast, when editing while reading comments! :) – Espen Nov 08 '22 at 20:44
1

@Espen It's probably clear by now, but (a) using the wrong specifiers on purpose is a supremely risky proposition, quite likely to mislead as much as it educates, and (b) "undefined behavior" is *not* something that you either get or don't get, deterministically, when you do or don't do something wrong. An *error message* is something you either get or don't get, deterministically, when you do or don't do something wrong. But undefined behavior is the wrong stuff that doesn't cause error messages. When you have undefined behavior, results tend to be nondeterministic and totally confusing. – Steve Summit Nov 08 '22 at 21:23
1

Exactly that was why I was asking the question. I expected it to behave unpredictably, but it didn't, hence the question. English isn't my first language, which is why I wasn't able to explain it clearly enough. The answer I got from @Eric Postpischil answers what I was asking. The pages allocated for the program are zero-initialized giving clear stack space for the values in the test function, while in main there is non-zero data in the space occupied by the num var. – Espen Nov 08 '22 at 21:35

Why is sscanf behaving like this when converting hex strings to number?

2 Answers2