6

So I ran some static code analyzer over some c code and one thing that surprised me was a warning about:

int val;
scanf("%d", &val);

which said that for large enough input this may result in a segfault. And surely enough this can actually happen. Now the fix is simple enough (specify some width; after all we know how many places a valid integer may have at most depending on the architecture) but what I'm wondering about is WHY this is happening in the first place and why this isn't regarded as a bug in libc (and a simple one to fix at that)?

Now I assume there's some reason for this behavior in the first place that I'm missing?

Edit: Ok since the question doesn't seem to be such clear cut, a bit more explanation: No the code analyzer doesn't warn about scanf in general but about scanf reading a digit without a width specified in specific.

So here's a minimal working example:

#include <stdlib.h>
#include <stdio.h>

int main() {
    int val;
    scanf("%d", &val);
    printf("Number not large enough.\n");
    return 0;
}

We can get a segfault by sending a gigantic number (using eg Python):

import subprocess
cmd = "./test"
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, shell=True)
p.communicate("9"*50000000000000)
# program will segfault, if not make number larger
Voo
  • 29,040
  • 11
  • 82
  • 156

3 Answers3

3

If the static analyzer is cppcheck, then it is warning about it because of a bug in glibc which has since been fixed: http://sources.redhat.com/bugzilla/show_bug.cgi?id=13138

2

edited since I missed the fact you feed a static code analyzer with it

If the format %d matchs the size of int, what overflows should not be what it is written into val through the pointer, since it should be always an int. Try to pass a pointer to long int and see if the analyzer give the warning still. Try to change %d into %ld, keeping the long int pointer, and see if the warning is given again.

I suppose standards should say something about %d, the type it needs. Maybe analyzer is worried about the fact that on some system int could be shorter than what %d means? It would sound odd to me.


Running your example compiled with gcc (and I have python 2.6.6) I obtain

Traceback (most recent call last):
  File "./feed.py", line 4, in <module>
    p.communicate("9"*50000000000000)
OverflowError: cannot fit 'long' into an index-sized integer
Number not large enough.

Then I tried running this instead:

perl -e 'print "1"x6000000000000000;' |./test

and modified the C part to write

printf("%d Number not large enough.\n", val);

I obtain as output

5513204 Number not large enough.

where the number changes at every run... never segfault... the GNU scanf implementation is safe... though the resulting number is wrong...

ShinTakezou
  • 9,432
  • 1
  • 29
  • 39
  • While that could be the case (although afaik the standard defines %d as integer sized), the analyzer is warning for a different problem. I hope the added example makes it clearer. – Voo Jul 02 '11 at 12:46
  • it seems to me that it is complaining for scanf in general, as an intrinsically "dangerous" function (as it would be `gets`) (but the problem depends on implementation). As said in another comment, scanf should not try to fill a fixed sized internal buffer with input without checking if it would overflow or not! Rather, it should stop trying to convert a number bigger than INT_MAX and store INT_MAX (and consume the rest of the input until a non-digit), or use the EOVERFLOW error (but this is POSIX.1) or maybe ERANGE (C99) or whatever, and exit. – ShinTakezou Jul 02 '11 at 13:07
  • Interesting, I ran the program on some older (well ancient I fear; not my choice) system with glibc 2.9. Just tried it in my usual dev environment and with cygwin and they give the same error as yours. So it seems this was indeed a bug in glibc but fixed on newer versions. Serves me right to not double check with the newest version. Surprising that such a bug existed for such a long time, but at least I don't have to worry about it on any modern system. – Voo Jul 02 '11 at 13:25
  • afaik current glibc v is 2.14 (http://www.gnu.org/s/libc/#CurrentStatus) maybe you meant the gcc version (last should be 4.3.6), and I suppose gcc 2.9 "ships" with a very old glibc... :) – ShinTakezou Jul 02 '11 at 13:37
  • No I really meant glibc 2.9 as in 2.09 - interestingly they really do call it 2.9 and not 2.09, which is a bit counterintuitive I agree ;) – Voo Jul 02 '11 at 18:55
1

The first step in processing an integer is to isolate the sequence of digits. If that sequence is longer than expected, it may overflow a fixed-length buffer, leading to a segmentation fault.

You can achieve a similar effect with doubles. Pushed to extremes, you can write 1 followed by one thousand zeroes, and an exponent of -1000 (net value is 1). Actually, when I was testing this a few years ago, Solaris handled 1000 digits with aplomb; it was at a little over 1024 that it ran into trouble.

So, there is an element of QoI - quality of implementation. There is also an element of 'to follow the C standard, scanf() cannot stop reading before it comes across a non-digit'. These are conflicting goals.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 2
    since it tries to read all the input into a buffer the size of which is limited and does not check for buffer overflow (maybe to be faster)? and shouldn't it be implemented so that, since the format is %d, it stops to fill the buffer once it reaches a value that would overflow the format, though it continues to consume numbers just to be in line with the standard? – ShinTakezou Jul 02 '11 at 08:35
  • 2
    IIUC, `scanf` can just return whatever it wants for out of range input. Glibc actually returns INT_MAX. So, you can just have a buffer large enough for the larger in-range input and discard every following digit. – ninjalj Jul 02 '11 at 08:37
  • Yeah as I understand it ninjalj is right - overflow behavior is undefined in C so scanf can return whatever it wants in such a case (and presumably that's why a segfault is fine too?). But I just find this behavior extremely strange - there's no reason to keep the whole string in a buffer after all just to read it. – Voo Jul 02 '11 at 12:39