8

fscanf() specifies the "%n" directive as a means to write "the number of characters read from the input stream so far by this call to the fscanf function" C11dr §7.21.6.2 12.

Let us call this number: ncount.


The "%n" directive may be preceded by length modifiers hh, h, l, ll, j and others. Examples:

FILE *stream = stdin;
int n_i;
fscanf(stream, "%*s%n", &n_i);    // save as int
signed char n_hh;
fscanf(stream, "%*s%hhn", &n_hh); // save as signed char
long long n_ll;
fscanf(stream, "%*s%lln", &n_ll); // save as long long

What is type or the minimal expected range of ncount?
What happens, or should happen, when "the number of characters read from the input stream" is large?

My findings:
The C spec appears quiet on the definition of the minimal range/type of ncount. ncount is usually saved via "%n" which specifies an int destination though not an int source.

By experimentation, ncount appears to be treated like an int or long on my platform - no real surprise there. (My int/long/long long are 4/4/8 bytes.) When saving ncount to a long long, the value saved does not exceed INT_MAX/LONG_MAX. ncount could have been unsigned for twice the usable range when assigned to long long, yet, this is an extreme corner and perhaps not considered by implementors.

My tests below showed no extended range of ncount past an int range, even when saved as a long long.

My interest stemmed from using "%*[^\n]%lln" to determine a (extreme) line length.


Implementation notes:

GNU C11 (GCC) version 6.4.0 (i686-pc-cygwin) compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13

glibc 2.26 released.

Intel Xeon W3530, 64-bit OS (Windows 7)


Test code

#include <limits.h>
#include <stdio.h>
#include <string.h>

int print(FILE *stream, long long size, int ch) {
  char buf[4096];
  memset(buf, ch, sizeof buf);
  while (size > 0) {
    size_t len = size < (long long) sizeof buf ? (size_t) size : sizeof buf;
    size_t y = fwrite(buf, 1, len, stream);
    if (len != y) {
      perror("printf");
      return 1;
    }
    size -= len;
  }
  return 0;
}

int scan(FILE *stream) {
  rewind(stream);
  long long n = -42;
  int cnt = fscanf(stream, "%*s%lln", &n);
  printf("cnt:%d n:%lld ", cnt, n);
  return cnt != 0;
}

int testf(long long n) {
  printf("%10lld ", n);
  FILE *f = fopen("x.txt", "w+b");
  if (f == NULL) {
    perror("fopen");
    return 1;
  }
  if (print(f, n, 'x')) {
    perror("print");
    fclose(f);
    return 2;
  }
  if (scan(f)) {
    perror("scan");
    fclose(f);
    return 3;
  }
  fclose(f);
  puts("OK");
  fflush(stdout);
  return 0;
}

int main(void) {
  printf("%d %ld %lld\n", INT_MAX, LONG_MAX, LLONG_MAX);
  testf(1000);
  testf(1000000);
  testf(INT_MAX);
  testf(INT_MAX + 1LL);
  testf(UINT_MAX);
  testf(UINT_MAX + 1LL);
  testf(1);
  return 0;
}

Test output

2147483647 2147483647 9223372036854775807

File length      Reported bytes read
      1000 cnt:0 n:1000 OK
   1000000 cnt:0 n:1000000 OK
2147483647 cnt:0 n:2147483647 OK
2147483648 cnt:0 n:-2147483648 OK  // implies ncount is internally an `int/long`
4294967295 cnt:0 n:-1 OK
4294967296 cnt:0 n:-1088421888 OK  // This `n` value may not be consistent. -1 also seen
         1 cnt:0 n:1 OK

[Edit]

With some runs of testf(UINT_MAX + 1LL);, I received other inconsistent results like '4294967296 cnt:0 n:1239482368 OK'. Hmmmm.

Sample fscanf() support source code uses an int for ncount.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I conflicted between voting up for an interesting question and not voting up because, if your input is exotic, you should be writing your own parsing and not using `fscanf`. Nonetheless, I expect the answer is C 2011 5.2.4.1: “The implementation shall be able to translate and execute at least one program…” Your program is not that one. – Eric Postpischil Dec 12 '17 at 21:02
  • 1
    I agree with your analysis of the standard. It does not specify a bound on how large an `ncount` must be supported. It does specify that the `ll` (and `z` and `t`) length specifiers can appear in an `n` directive, and of course you're right that that describes the size of the receiving variable, not the source count. – John Bollinger Dec 12 '17 at 21:04
  • @EricPostpischil Use of `fscanf()` does not know prior to reading if the input is exotic. The issue is not that the input is exotic, but that `"%*s%n"` is not resilient to exotic input. For example, `"%1000s%n"` would work with exotic input - as long as `ncount` is at least 10 bit. – chux - Reinstate Monica Dec 12 '17 at 21:11
  • This is exactly why I don't understand why the type expect by `%n` is not a `size_t`. I suppose legacy is the main reason. – Stargateur Dec 12 '17 at 21:12
  • 3
    The standard also specifies that the behavior is undefined (for all conversions) if the result of the conversion cannot be represented in the destination object. Inasmuch as I don't see any other qualification on how large an `ncount` is accommodated, I'm inclined to regard it as an implementation flaw if the reported count is bounded by limits smaller than the capacity of the destination object's type. – John Bollinger Dec 12 '17 at 21:12
  • @Stargateur Thanks for the edit that added a missing `include`. Yet the cast of `long long` to `size_t` changed functionality with my 32-bit `size_t` as it truncated `4294967296`. Code amended to not warn and work for sizes up to `LLONG_MAX`. – chux - Reinstate Monica Dec 13 '17 at 16:19
  • @chux My bad, sorry. I didn't cast the good operand. I test with my 64 size_t ^^'. (But that explain a little why in your system the result is -1, scanf must use a size_t type to hold count internally, that surprise me that your size_t is 32bit on a 64bit system. Windows != ♥) – Stargateur Dec 13 '17 at 16:28
  • @Stargateur What was your result with `testf(UINT_MAX + 1LL);`? (Assuming `unsigned` is 32-bit) BTW: I am now sure my compiler/library used a 32-bit `int` for `ncount`. – chux - Reinstate Monica Dec 13 '17 at 16:31
  • @chux I get correct output (not -1 or garbage) with both codes (actuel code and my wrong edit) with my 64bit archlinux (`unsigned is 32bit`) https://pastebin.com/3Py5w5cr. – Stargateur Dec 13 '17 at 16:50
  • @Stargateur Thanx. To a "quality of implementation", your library is better than mine. OTOH, it might be impractical for your platform to test its limit. I'll assume it is good for at least about `INT64_MAX` - which should handle extreme code for perhaps 30-40 years given [Moore's law](https://en.wikipedia.org/wiki/Moore%27s_law). ;-) – chux - Reinstate Monica Dec 13 '17 at 16:59

1 Answers1

5

What is type or the minimal expected range of ncount?

The standard does not specify any specific minimum. It flatly says

The corresponding argument shall be a pointer to signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function.

(C2011, 7.21.6.2/12)

This leaves no room for a conforming implementation to store a different number in the destination variable, except inasmuch as the standard specifies for all conversions, including %n, that

if the result of the conversion cannot be represented in the [destination] object, the behavior is undefined.

(C2011 7.21.6.2/10)

What happens, or should happen, when "the number of characters read from the input stream" is large?

If the pointer corresponding to the %n directive is correctly typed for the directive's length specifier (or lack thereof), and if the true count of characters read up to that point by that scanf() call can be represented in an object of that type, then the true count should in fact be stored. Otherwise, the behavior is undefined.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157