8

If a scanf family function fails to match the current specifier, is it permitted to write to the storage where it would have stored the value on success?

On my system the following outputs 213 twice but is that guaranteed?

The language in the standard (C99 or C11) does not seem to clearly specify that the original value should remain unchanged (whether it was indeterminate or not).

#include <stdio.h>

int main()
{
    int d = 213;

    // matching failure
    sscanf("foo", "%d", &d);
    printf("%d\n", d);

    // input failure
    sscanf("", "%d", &d);
    printf("%d\n", d);
}
M.M
  • 138,810
  • 21
  • 208
  • 365
  • I suggest watching the return value from `sscanf`, also consider `set_invalid_parameter_handler` if you're using VC. – Dai Sep 07 '14 at 22:14
  • I wonder if there is any need to define something here. Can you come up with an example where this may be useful (or necessary to know for an implementor of a libc)? I don't object to a pure language-lawyer question, I'm just curious about the context of the question... – mafso Sep 07 '14 at 22:42
  • 2
    @mafso sometimes people use the idiom of setting default values , and then doing `scanf`; and then expecting that if the read failed, the previously-set default will be usable – M.M Sep 07 '14 at 23:06
  • 2
    @MattMcNabb: I think in general that should be a valid idiom, but as I mentioned, `%c` is problematic... – R.. GitHub STOP HELPING ICE Sep 07 '14 at 23:14

2 Answers2

5

The relevant part of the C11 standard is (7.21.6.2, for fscanf):

7 A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:

8 […]

9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285) The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.

10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. […]

To me, the words “step” and “If the length of the input item is zero, the execution of the directive fail” indicate that if the input does not match a specifier in the format, interpretation stops before any assignment for that specifier has occurred.


On the other hand, the subclause 4 about the ones quoted makes it clear that specifiers up to the failing one are assigned, again using language appropriate for ordered sequences of events:

4 The fscanf function executes each directive of the format in turn. When all directives have been executed, or if a directive fails (as detailed below), the function returns.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 3
    In at least one case, I think `scanf` almost **has to** modify the pointed-to storage even when the directive fails: for the `%c` directive, which requires exactly `n` characters, where `n` is the requested field width, rather than just up-to-`n`-characters. In this case, if `n` is huge and the file is not seekable, there is no way to know in advance before storing to the destination buffer whether the directive will succeed or fail. However otherwise I agree with your first interpretation. This is probably a defect that should be raised with the committee... – R.. GitHub STOP HELPING ICE Sep 07 '14 at 22:43
  • @R.. Are you saying, that the standard actually mandates to not touch the arguments for `%c` aso., on a matching/read failure? (And this is a defect, as this is close to impossible to accomplish.) To me, the standard seems to not define anything in that case but the return value. – mafso Sep 07 '14 at 23:06
  • 2
    @mafso: I'm saying it's not clear that the standard permits this to happen, despite there being no other way to implement it. Note that for `%s` and `%[`, matching failure only happens if no characters are read, in which case there is no reason for the implementation to have written anything into the buffer. But the expected form for `%c` has an exact number of characters (if it were allowed to read fewer, the caller would have no way to determine how many were successfully read), so a matching failure occurs if EOF is hit early. – R.. GitHub STOP HELPING ICE Sep 07 '14 at 23:11
  • @R.. An early EOF on %c (with a maximum width field) modifies the value on early EOF (as expected) and my glibc returns success in this case.`ideone.com`'s libc also does. I'm not sure how to read the standard regarding this (whether the maximum field width denotes a maximum or an exact width), but I think the return value (if not negative) exactly tells you how many variables were read. Your concerns, that input validation may become hard this way, also apply to trailing ordinary characters (non-conversion specifications), where you need to use `%n` to check. – mafso Sep 13 '14 at 18:31
  • 1
    @mafso: glibc is known to behave incorrectly in this case. It's part of [bug #12701](https://sourceware.org/bugzilla/show_bug.cgi?id=12701). The relevant standard text is 7.21.6.2 The fscanf function, under paragraph 11: "c Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive)." – R.. GitHub STOP HELPING ICE Sep 13 '14 at 19:06
3

Judging from ISO/IEC 9899:2011 §7.21.6.2 The fscanf function:

¶10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

In the larger context, this seems to mean that the assignment to the target variable only occurs after the conversion is successful. For numeric types, that makes sense and is readily achievable. For string types, it is not so clear cut, but it should work the same way (the text quoted does state that the assignment only occurs if there is no matching failure or input failure). However, if there is an encoding error part way through a string (%s or %30c or %[a-z]), it would not be surprising to find that the first part of the string is changed even though the conversion as a whole failed. This could probably be regarded as a bug. Stimulating the bug accurately might be hard; for example, it might require UTF-8 input and an invalid byte such as 0xC0 or 0xF5 in the input stream.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Encoding errors cannot occur; these directives read **bytes**, not multibyte characters. However, `%c` can fail as a result of hitting before the expected number of bytes have been read. The others succeed as long as they read at least one byte, so I see no way any sane implementation of those could modify the buffer on failure. – R.. GitHub STOP HELPING ICE Sep 07 '14 at 23:16
  • @R..: File the DR with the C Standard committee. ISO/IEC 9899:2011 §7.21.6.2 ¶6 _A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If any of those characters differ from the ones composing the directive, the directive fails and the differing and subsequent characters remain unread. Similarly, if end-of-file, an encoding error, or a read error prevents a character from being read, the directive fails._ – Jonathan Leffler Sep 07 '14 at 23:21