8

The standard C library function atoi is documented in ISO 9899:2011 as:

7.22.1 Numeric conversion functions

1 The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined.

...

7.22.1.2 The atoi, atol, and atoll functions

Synopsis

#include <stdlib.h>
int atoi(const char *nptr);
long int atol(const char *nptr);
long long int atoll(const char *nptr);

Description

2 The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to

atoi: (int)strtol(nptr, (char **)NULL, 10)
atol: strtol(nptr, (char **)NULL, 10)
atoll: strtoll(nptr, (char **)NULL, 10)

Returns

3 The atoi, atol, and atoll functions return the converted value.

What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer? The following four opinions seem to exist:

  • No conversion is performed and zero is returned. This is the documentation given by some references like this one.

  • Behaviour is like that of strtol except that errno might not be set. This emerges from taking “Except for the behavior on error” as a reference to §7.22.1 ¶1.

  • Behaviour is unspecified. This is what POSIX says:

    The call atoi(str) shall be equivalent to:

    (int) strtol(str, (char **)NULL, 10)
    

    except that the handling of errors may differ. If the value cannot be represented, the behavior is undefined.

    Furthermore, the section Application Usage states:

    The atoi() function is subsumed by strtol() but is retained because it is used extensively in existing code. If the number is not known to be in range, strtol() should be used because atoi() is not required to perform any error checking.

    Note that POSIX claims that the specification is aligned to ISO 9899:1999 (which contains the same language as ISO 9899:2011 as far as I'm concerned):

    The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard.

    According to my local POSIX committee member, this is the historical behaviour of UNIX.

  • Behaviour is undefined. This interpretation arises because §7.22.1.2 ¶2 never explicitly says what happens on error. Behaviour that is neither defined nor explicitly implementation defined or unspecified is undefined.

Which of these interpretations is correct? Please try to refer to authoritative documentation.

Community
  • 1
  • 1
fuz
  • 88,405
  • 25
  • 200
  • 352
  • `If the value of the result cannot be represented, the behavior is undefined.` – cpplearner Jul 15 '16 at 12:45
  • 1
    @cpplearner That sentence applies for when the number you try to parse is out of range for the result type. It does not apply for when the input cannot be parsed at all since then, no value exists. – fuz Jul 15 '16 at 12:56
  • If "behavior on error" includes what happens when there are no digits to convert, then it's undefined. If that is not an "error" then the return value is 0. But if knowing the precise behavior is so important, why not just use the better-documented function `strtol` instead? (Or wrap `strtol` in a function of your own if you want the return type to be `int` instead of `long int`?) – David K Jul 15 '16 at 13:35
  • @DavidK Because I want to know the answer. – fuz Jul 15 '16 at 13:38

1 Answers1

3

What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer?

To be clear, this question applies to

// Case 1
value = atoi("");
value = atoi("  ");
value = atoi("wxyz");

and not the following:

// Case 2
// NULL does not point to a string
value = atoi(NULL);
// Convert the initial portion, yet has following junk
value = atoi("123xyz");
value = atoi("123 ");

And maybe/maybe not the following depending on usage of integer.

// Case 3
// Can be parsed as an _integer_, yet overflows an `int`.
value = atoi("12345678901234567890123456789012345678901234567890");

The "non-Case 2" behavior of ato*() depends on the meaning of error in

The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to
atoi: (int)strtol(nptr, (char **)NULL, 10)
...
C11dr §7.22.1.2 2


Certainly error includes case 3: "If the correct value is outside the range of representable values". strto*(), though maybe not ato*(), in this case does set the error number errrno defined in <errno.h>. Since the specification of ato*() does not apply to this error, overflow, the result, is UB per

Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. C11dr §4 2


For case 1, the behavior of strto*() is well defined and is not specified to affect errno. The spec goes into detail (§7.22.1.4 4) and calls these "no conversion", not an error. So it can asserted the case 1 strto*() behavior is not an error, but a "no conversion". Thus per ...

"If no conversion could be performed, zero is returned. C11dr §7.22.1.4 8

... atoi("") must return 0.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Interesting interpretation. – fuz Jul 15 '16 at 14:45
  • @FUZxxl t'was going to add, but didn't an opinion: When in doubt consider history. C89 was written to avoid obsoleting most of the code base. Subsequent versions very slowly tightened behavior definitions. Could pre-C89 `atoi("")` code return something other than 0 or faulted? Yes I think so. Would any compiler in use 2016 onwards do something other than return 0, I very much doubt. I suspect the spec is just fuzzy enough here to have allowed all early `atoi()` code, yet the spec was tweaked to imply: "Of course "no_conversion" should return 0 for `atoi()`". IAC, best to use `strto*()`. – chux - Reinstate Monica Jul 15 '16 at 15:23
  • 2
    @chux: From 1989 to roughly 2000, the fraction of implementations that were likely to do anything weird would have dropped to almost none. Since then, however, it has become fashionable to use UB as an excuse to throw laws of time and causality out the window, so unless or until there is a standard for a sane core dialect of C, the amount of code required to prevent bizarre behaviors will increase. – supercat Jul 15 '16 at 17:38
  • 1
    @supercat Agree. A "core dialect" with `ato*()/strto*()` would help. Something that has no UB in its spec. – chux - Reinstate Monica Jul 15 '16 at 17:55
  • @chux: Most things which are UB could more sensibly be described as choosing in Unspecified fashion from among an Implementation-Defined set of possible behaviors which may be specified as precisely or as broadly as the implementer sees fit (if it isn't practical to make any meaningful behavioral guarantees in a certain situation, an implementation should be allowed to say that, but saying that needlessly would be a sign of a poor-quality implementation). On the other hand, the only real difference between the Standard classifying a behavior that way vs UB is that the former would... – supercat Jul 15 '16 at 18:25
  • 1
    ...imply that good quality implementations *should* define behaviors; since the authors of the Standard for whatever reason seem to deliberately avoid saying much about what should be expected from good quality implementations, however, and since the only difference between the aforementioned classification of behavior and UB would be to regard some implementations as inferior, the authors of the Standard saw no need for such distinction. – supercat Jul 15 '16 at 18:30
  • @fuz As the accept was over a year old, was there something lacking in the answer. How may it be improved for you? – chux - Reinstate Monica Sep 15 '17 at 14:22
  • @chux The way the answer is written is very confusing (especially how “error” is used to mean both the specific error and errors in general) and I was waiting for a better answer to come along, especially since it all boils down to “undefined behaviour is undefined behaviour,” leaving me a bit unsatisfied; why did they chose to omit behaviour instead of explicitly specifying behaviour as undefined? – fuz Sep 15 '17 at 14:32
  • @fuz "why did they chose to omit behaviour instead of ..." `atoi()` is an old function. The specifications in the first standard, C89, was purposefully broad to capture not invalidate existing compilers. Consider that pre-standard `atoi()`, an overflow might return 1) INT_MAX, 2) a "mod" value 3) a run-time error, 4) 0, 5) something else. By "Except for the behavior on error ..." specification, all these compilers were compliant. IOWs, C89 did not confine `atoi()` much. That was left for `strtol()` – chux - Reinstate Monica Sep 15 '17 at 14:47