50

When sscanf() or another function from the scanf family is given a sequence of digits whose converted value exceeds the maximum value of the target integer type,

  • should the conversion be considered to have failed?
  • is the behavior defined at all?
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 2
    The question is more subtle: parsing the subject sequence has defined behavior as per `strtol()`, but storing the resulting value into an `int` has implementation behavior if the value is too large, furthermore if the library function tests for overflow, which it should, should this overflow be considered a conversion failure resulting in a short count or not? – chqrlie Jun 24 '17 at 20:18
  • It looks like we do have potential undefined behavior! `scanf()` finally goes down the drain. – chqrlie Jun 24 '17 at 20:21
  • FWIW in MSVC this `sscanf` returns `1` with `n` as `-1`. When everything is changed to `unsigned` it is the same, that is `n` is `UINT_MAX`. – Weather Vane Jun 24 '17 at 20:22
  • @chqrlie: It does; I was wrong in my comment and the initial version of my answer. Please read the new version. – too honest for this site Jun 24 '17 at 20:24
  • 5
    @WeatherVane: Empirical tests are a bad reference for the behaviour of C code. – too honest for this site Jun 24 '17 at 20:24
  • 4
    @Olaf that's probably the reason for the FWIW ... –  Jun 24 '17 at 20:25
  • @Olaf yes: just an observation. There is nothing like "hands on" and "RTM" ~ I did both. More generally to your comment: false positives can be misleading, but true negatives are conclusive. – Weather Vane Jun 24 '17 at 20:25
  • @WeatherVane: In C both give the same information for UB: none. You cannot rely on either. – too honest for this site Jun 24 '17 at 20:29
  • @Olaf see my edit: true negatives are conclusive. – Weather Vane Jun 24 '17 at 20:30
  • @WeatherVane: Not really. Could ne any other value on 99% of yll runs returning that value, the rest any other. And on other systems, maybe even a different libc version it nasal daemons could appear. – too honest for this site Jun 24 '17 at 20:32
  • @Olaf true negatives **are** conclusive of bad behaviour. – Weather Vane Jun 24 '17 at 20:36
  • Sure it's defined: 'YOU'RE FIRED, GET OUT'. – ThingyWotsit Jun 24 '17 at 20:38
  • @WeatherVane: Ah, got you now. Yes, in a way. But (this is to beginners, as I know you know:) it is guarantee to show the same beahviour on all systems. A different environment might yield a false positive. – too honest for this site Jun 24 '17 at 20:38
  • @OLaf I do not need any lecture on UB, but thank you for your concern. – Weather Vane Jun 24 '17 at 20:39
  • @WeatherVane "this is to beginners, as I know you know:" -was that not clear enough as disclaimer? We both are a bit thin-skinned about that aren't we ;-) – too honest for this site Jun 24 '17 at 20:40
  • 1
    That was just your get-out. – Weather Vane Jun 24 '17 at 20:41
  • The behavior is defined if an only if `INT_MAX > 123456789123456789123456789`. That would require `int` to be at least 88 bits, which is very unlikely. You might want to state explicitly your assumption that the value exceeds `INT_MAX`. – Keith Thompson Jun 24 '17 at 20:49
  • 2
    @KeithThompson: the title leaves some room for defined behavior, but the body of the question is unambiguous: *given a sequence of digits whose converted value exceeds the maximum value for the integer type*. – chqrlie Jun 24 '17 at 20:51
  • 3
    @KeithThompson: The behavior is indeed defined iff `INT_MAX >= 123456789123456789123456789`. The `>=` is more accurate, although I believe `INT_MAX` must be a power of 2 minus 1. – chqrlie Jun 24 '17 at 21:02
  • @chqrlie: Out-quibbled again! – Keith Thompson Jun 24 '17 at 23:20

1 Answers1

50

From the standard, 7.21.6.2p10 ((f)scanf, applies to the whole family):

… If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

Looks like another reason to be very cautious with the scanf family. The strtoXX functions have a fully defined behaviour. They return LONG_MAX etc. for too large input and set errno == ERANGE. So if you need exact information, tokenise the input manually and use these functions for conversion. Another benefit: better error handling.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • 5
    However, consider "Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined." – cpplearner Jun 24 '17 at 20:16
  • @cpplearner: Re-read my answer, I was wrong with my initial conclusion looking at the wrong place. It is very clear. (believe it or not, I changed that before reading your comment) – too honest for this site Jun 24 '17 at 20:19
  • 5
    @Olaf 7.22.1.4p8 "*If the correct value is outside the range of representable values [...] the value of the macro `ERANGE` is stored in `errno`.*". So you can differentiate. `strtol()` is really *fully defined*. –  Jun 24 '17 at 20:30
  • @FelixPalmen: Stupid me, I read this when i wrote the initial version of my answer, but forgot when I added my remark. Sorry, I'm a bit distracted currently. Thanks for the reminder. – too honest for this site Jun 24 '17 at 20:35
  • me too, I first cited the *wrong* paragraph (concerning the floating point family). I think it's a perfect answer now. –  Jun 24 '17 at 20:39
  • 4
    Great answer. I read the C11 spec before posting the question and I missed this simple punchline... `scanf()` has so many shortcomings and is misused in no many ways, it really should be avoided. – chqrlie Jun 24 '17 at 20:57
  • @chqrlie: Indeed. See the edit-history, I was on the completely wrong track, too. These functions need a major overhaul. – too honest for this site Jun 24 '17 at 21:00
  • @Olaf: All that would be necessary would be to define a standard macro which indicates whether an implementation offers certain guarantees beyond those by the standard. Just as a program that tests certain macros would be entitled to assume that floating-point division by zero will be processed as defined behavior (yielding NaN), likewise a program that tests other macros would be entitled to assume that corner cases of sscanf will be processed in the defined fashion indicated thereby. All any implementation would need to do to uphold the new standard would be to define a macro. – supercat Jul 04 '17 at 16:50
  • @supercat: You are welcome to write a libc implementation which guarntees that. But what use would it have if you have to provide a different way for other environments anyway? We don't discuss hypotetical situations or a wish-list (I have quite a long for the language itself), but the status quo. You already can test for a specific implementation/library; that should be sufficient. – too honest for this site Jul 04 '17 at 17:10
  • @Olaf: If the Standard defined such flags, then implementations wishing to run the widest range of programs could look to see how much code requires which features, and thus determine which ones would provide most "bang for the buck". Further, many implementations are designed in ways that inherently offer behavioral guarantees that would be useful if documented. Most implementations will react to an out-of-range number in one of a few ways. If a program's target implementation defines it in a way that meets that program's requirements, and the program isn't expected to be... – supercat Jul 05 '17 at 15:29
  • ...widely ported, code which checks for a suitable implementation and uses `sscanf` would likely be easier to write, read, and verify than one which uses custom logic. If such code needs to be ported to some other platform whose `sscanf` isn't suitable, custom logic could be written at that time, but if it never ends up being formatted to such a platform, why waste the effort? – supercat Jul 05 '17 at 15:31
  • @supercat: You miss the idea of C being a compact and widely useable language for low-level problems. The standard cannot specify all such flags in advance anyway, so there is little use for such a flag. For the optional features of C itself, there are such macros. Maybe you think C++ is "C with classes"; though. You could not be more wrong. Said that: this is not the place for discussion. Please take this before the C standard's working group. I will not further participate here. – too honest for this site Jul 05 '17 at 15:35