16

Today, I took a short "C++ skills test" from Elance.com. One question was the following:

What is the security vulnerability of the following line of code:

printf("%s", argv[1]);

Option 1: Format String

Option 2: Stack Overflow <-- This was marked by Elance as the correct answer

The user was provided 10 seconds to answer this question after an initial few seconds of seeing the question (or automatically fail the question). (There were also two other clearly irrelevant answers that were not marked as the correct answer by Elance.)

I was looking for buffer overrun or buffer overflow as an option.

I instinctively did not like the answer stack overflow, because in my 10 seconds I mentally used what I believe is the standard definition of "Stack Overflow":

In software, a stack overflow occurs when the stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program ...

According to this definition of "Stack Overflow", a buffer overrun is entirely possible without a stack overflow; it is a stack overflow only if the program attempts to write outside the calling program's total stack allocation (whether due to a buffer overrun, or whether it would otherwise be a legitimate write, such as allocating memory for stack-based variables an excessive number of times).

My 10-second instinct told me that "buffer overrun" is a more accurate description of the problematic line of code, above - because often (in my experience) there are sufficient null characters ('\0') peppered through garbage data in RAM to often avoid an actual stack overflow in cases like this, but a buffer overrun in the implementation seems reasonably possible or even likely. (But the possibility that printf reads garbage here might assume that argc == 1, such that there was no user-provided argv[1]; if argv[1] is present, perhaps one can assume it's likely that the calling function has not inserted NULL's. It was not stated in the problem whether argv[1] was present.)

Because I imagined that there could be a buffer overrun problem here, even without a stack overflow, I answered Format String, because simply by passing a different format string such as "%.8s", the problem can be mostly avoided, so it seemed like an overall more generic, and therefore better, answer.

My answer was marked as wrong. The correct answer was marked as "Stack Overflow".

It now occurs to me that perhaps if one assumes that argv[1] is present, that the only possible buffer overrun is a stack overflow, in which case, stack overflow might in fact be the correct answer. However, even in this case, would it not be considered odd to call this a stack overflow? Isn't buffer overflow a better way to describe this problem, even assuming argv[1] is present? And, if argv[1] is not present, isn't it pretty much incorrect to state that the problem is stack overflow, rather than the more accurate buffer overrun?

I would like the opinion of professionals on this site: Is "stack overflow" the proper way to define the memory safety problem with the above line of code? Or, rather, is "buffer overflow" or "buffer overrun" clearly a better way to describe the problem? Finally, given the two options provided for the question's answer (above), is the answer ambiguous, or is "stack overflow" (or "format string") clearly the better answer?


Tangential Comments regarding the Elance test (Not related to the question in this posting)

None of the Elance "C++ skills test" questions pertained to any C++-specific features such as classes, templates, anything in the STL, or any aspect of polymorphism. Every question was a down-and-out, straight-from-C question.

Because there were many (at least 3) other questions in Elance's so-called "C++ skills test" that were unarguably wrong (such as this question: given sizeof(int) == sizeof(int*) and sizeof(int) == 4, then in the code int *a, *b; a=b; b++; b-a;, what is b-a, with the correct answer listed as 4, rather than the actual correct answer of 1), and given the fact that there were no C++-specific questions on the test, I have contacted Elance and plan to seriously pursue their problematic test with the organization. However, for the question discussed in this posting, I am uncertain whether the question/answers are problematic.

Dan Nissenbaum
  • 13,558
  • 21
  • 105
  • 181
  • So where is the rest of the code?, `printf("%s", argv[1]);` may be well defined. Are there certain assumptions about the program we aren't aware of? – this Jun 15 '14 at 18:56
  • Agree, I would not call that a stack overflow. – aschepler Jun 15 '14 at 18:58
  • 1
    @self. Absolutely nothing. In the Elance test, the user was literally provided only the single line of code exactly as I provided in this posting - with no context provided in the question whatsoever, only the single question *what is the security vulnerability of the following line of code*, with four multiple-choice options provided. (This, in a "C++ Skills Test", to boot.) – Dan Nissenbaum Jun 15 '14 at 18:58
  • Maybe a NULL pointer dereference ? – wildplasser Jun 15 '14 at 19:01
  • In `c` this could only be a null pointer dereference. ( c++ same? ) – this Jun 15 '14 at 19:02
  • 4
    What were those "irrelevant answers"? Were they in fact highly relevant? – Kerrek SB Jun 15 '14 at 19:02
  • @wildplasser - would that be under the assumption that `argv[1]` is not present? – Dan Nissenbaum Jun 15 '14 at 19:02
  • Argv[1] is "present", but it can be NULL – wildplasser Jun 15 '14 at 19:03
  • @KerreckSB - I don't remember specifically. They were unrelated junk answers such as "function overloading"; they were clearly not related to the correct answer in any way, and they were also not indicated as the correct answer by Elance. – Dan Nissenbaum Jun 15 '14 at 19:04
  • 3
    `argv[1]` could be anything. There might not be any arguments passed to the program, in which case this points "somewhere in space" (or maybe is NULL). In general, printing a string with an unknown number of characters (basically, without `nul` termination) ought not to result in a stack overflow since there isn't very much "stack" used to do the printing - I believe. Only the argument (pointer to string) gets pushed on the stack - it is not clear that this would cause stack overflow (unless you got here after exhausting the stack - this would be "the straw that breaks the camel's back then). – Floris Jun 15 '14 at 19:17
  • 7
    I'd be lucky NOT to work there... if they are that incompetent. You can do better. – Marius Jun 15 '14 at 19:17
  • 2
    As mentioned above, it's neither a classic format string vulnerability (in which the attack controls the format string, which is not the case here), nor a classical stack-based buffer overflow (in which case the attacker can do an arbitrary write of a certain number of bytes at a fixed relative offset of a stack frame). Both answers are wrong. Not sure what the correct answer would be, in fact I don't think this is particularly interesting from an exploitability standpoint. It's a potential nill pointer dereference, but probably one without consequences except for a crash. – Niklas B. Jun 15 '14 at 19:23
  • In ring0 or on some embedded systems with no memory protection it might lead to memory disclosure, but that's somewhat far-fetched because you *probably* won't find a `printf` or even `argv` there – Niklas B. Jun 15 '14 at 19:27
  • "because often (in my experience) there are sufficient NULL's peppered through garbage data in RAM to often avoid an actual stack overflow in cases like this" -- `NULL` is a null *pointer* constant; you mean null *characters* (`'\0'`) – Keith Thompson Jun 15 '14 at 19:30
  • 1
    The function prints to the `stdout` and it uses the standard (or user-provided) buffering, and the user input is not used as a formatting string, so IMVHO neither stack overflow, nor buffer overrun or format string vulnerability arises here. The only problem I can see is that `printf()` may improperly handle NULL argument and cause access violation instead of printing '(null)' or something like that. – CiaPan Jun 15 '14 at 19:33

3 Answers3

6

There is no potential stack overflow here.

The standard guarantees that argc is non-negative, which means it can be 0. If argc is positive, argv[0] through argv[argc-1] are pointers to strings.

If argc == 0, then argv[1] is not merely a null pointer -- it doesn't exist at all. In that case, argv[1] attempts to access a nonexistent array element. (argv[1] is equivalent to *(argv+1); the pointer addition is permitted, but the dereference has undefined behavior.) Note that in this case the program name, which would otherwise be accessible via argv[0] is unavailable.

If argc==1, then argv[1] == NULL. Evaluating argv[1] is perfectly valid, but it yields a null pointer. Passing a null pointer to printf with a "%s" option has undefined behavior. I suppose you could call this a format string problem, but the real problem is using a null pointer when a non-null pointer to a string is required.

If argc >= 2, then argv[1] is guaranteed to point to a string, printf("%s", argv[1]) will simply print the characters of that string, up to but not including the terminating '\0' (which is guaranteed to be exist).

There is still a potential vulnerability in that case. Quoting N1570 7.21.6.1 paragraph 15:

The number of characters that can be produced by any single conversion shall be at least 4095.

(N1570 is a draft of the C standard; C++ refers to the C standard for portions of its standard library.)

Which means that an implementation may limit the number of characters produced by the printf call. In practice, there's probably no reason to impose a fixed limit; printf can simply print characters, one at a time, until it reaches the end of the string. But in principle, if strlen(argv[1]) > 4095, and if the current implementation imposes such a limit, then the behavior could be undefined.

Still, this isn't what I'd call a "stack overflow" -- particularly since the C++ standard does not use the word "stack" (except for a couple of brief references to "stack unwinding").

Most of these problems can be avoided by checking first:

if (argc >= 2) {
    printf("%s", argv[1]);
}

or, if you're feeling paranoid:

if (argc >= 2 && argv[1] != NULL) {
    printf("%s", argv[1]);
}
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 6
    The C standard does not use the word “stack”, but security experts use the words “stack overflow”, with a precise meaning. Just because the C standard does not use the word does not mean that the concept does not exist. (And this is a shift in thinking that I still have trouble with, and I am not the only one to have difficulties with the security mindset considering how difficult it was to get the answer I was looking for when I asked http://stackoverflow.com/questions/23480542/ , or perhaps it is my fault again for not explaining it well). – Pascal Cuoq Jun 15 '14 at 19:49
  • 1
    I don't think that the pointer addition in `*(argv+1)` has undefined behavior, because it is “one-past”. Dereferencing it is clearly undefined behavior. – Pascal Cuoq Jun 15 '14 at 19:51
  • 1
    @Jefffrey: Yes, so if `argc == 0`, then `argv[0]` is guaranteed to be `0`. What does that say about `argv[1]`? – Keith Thompson Jun 15 '14 at 19:56
  • 1
    @Jefffrey: Yes, fixed. (It could be tricky, though. If `argc == 0`, then that would seem to imply a zero-length array -- but C doesn't have zero-length arrays.) – Keith Thompson Jun 15 '14 at 19:58
  • 1
    `argv` has length `argc + 1`, because it must contain a terminating null pointer in all circumstances. The problem of it being 0-sized does not arise. – Pascal Cuoq Jun 15 '14 at 20:00
  • 1
    @KeithThompson C++ doesn't have zero-length arrays (C-style arrays) either. Except for `std::array` of course. – Shoe Jun 15 '14 at 20:07
  • @Jefffrey: Right. I should pay more attention to tags. – Keith Thompson Jun 15 '14 at 21:09
4

On a Unix system argv[1] can be an invalid memory access in and of itself (case argc==0), a pointer to a well-formed string (argc >= 2), or NULL (argc == 1).

The problem with printf("%s", argv[1]); is using a pointer (argv[1]) without having checked that it was valid. Anything that happens later is only a secondary consequence. The problem is the failure to validate that argv[1] is what is intended before using it. It might fall under the very general CWE20: Improper Input Validation. It is misleading to call it a buffer overflow or a stack overflow.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 2
    But the test question gives no assumption about prior test for the argument validity, so one can't tell if that `printf` sole is or is not incorrect/vulnerable. – CiaPan Jun 15 '14 at 19:36
  • I assume you mean `argc == 1`. – Dan Nissenbaum Jun 15 '14 at 19:37
  • @DanNissenbaum Yes, fixed. – Pascal Cuoq Jun 15 '14 at 19:39
  • Quick question: If `argc == 1`, doesn't that mean that `argv` points to an array of `char *`'s of size 1 (and that this array of `char *`'s is not NULL-terminated)? – Dan Nissenbaum Jun 15 '14 at 19:39
  • 1
    @DanNissenbaum `argv` is null-terminated, which is why Pascal wrote that in the case of `argc == 1`, we have `argv[1] == NULL`. Indeed for the case `argc == 0` (is that even possible?), `argv[1]` is uninitialized, which might be the most problematic case – Niklas B. Jun 15 '14 at 19:40
  • @DanNissenbaum `argv` is always NULL-terminated. The property `argv[argv]==0` always holds at the time the program's `main` is invoked. – Pascal Cuoq Jun 15 '14 at 19:41
  • @NiklasB. Anyone can call `execve` with only an array containing NULL in place of `argc`. A fun exercise I recommend in the blog post I linked to is to find a privileged program that can be made to do funny things because it assumes that `argc >= 1` (and thus `argv[0] != NULL`). Note that when `argc==0`, `argv[1]` is already an out-of-bound access in itself. It's not so much that the memory location is uninitialized, it's that it doesn't exist! – Pascal Cuoq Jun 15 '14 at 19:57
  • @PascalCuoq That's interesting. Still I guess in practice "doesn't exist" and "uninitialized" have similar consequences, in particular that the memory location could have any kind of unpredictable (and potentially attacker-controlled) value, if you are unlucky enough that it does not generate an access violation – Niklas B. Jun 15 '14 at 20:00
3

C++ standard response

As far as the language is concerned there can be the following cases:

  1. argc < 2
  2. argc >= 2

In the first case, printf("%s", argv[1]) is simply undefined behavior.

In the second case, the program is well-formed (as from argv[0] to argv[argc-1] are guaranteed to be valid null terminated strings:

§3.6.1/2 [basic.start.main]

In the latter form, for purposes of exposition, the first function parameter is called argc and the second function parameter is called argv, where argc shall be the number of arguments passed to the program from the environment in which the program is run. If argc is nonzero these arguments shall be supplied in argv[0] through argv[argc-1] as pointers to the initial characters of null-terminated multibyte strings (ntmbs s) (17.5.2.1.4.2) and argv[0] shall be the pointer to the initial character of a ntmbs that represents the name used to invoke the program or "". The value of argc shall be non-negative. The value of argv[argc] shall be 0. [ Note: It is recommended that any further (optional) parameters be addedafterargv. —endnote]

(emphasis mine).

Why stack overflow is awfully imprecise

Given that no other informations were given (as compiler or architecture), the answer "Stack Overflow" is simply imprecise. The C++ standard does not try to defined what a "stack" is, and therefore "stack overflow" means almost nothing to the C++ standard.

The standard reasons in terms of an abstract machine with a guaranteed memory model.

What really happens

In the case in which argc < 2, nobody knows what happens. The standard does not guaranteed nor specify anything. In the case in which argc >= 2 the program is well-defined.

Community
  • 1
  • 1
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • Both invalid format strings and classic buffer overruns lead to undefined behaviour as well, so I don't see what this answer adds to the case – Niklas B. Jun 15 '14 at 19:44
  • If `argc == 1`, then `argv[1]` is not UB (but dereferencing it is). – Pascal Cuoq Jun 15 '14 at 19:44
  • In fact "stack overflow" in the security world has a very precise meaning. But I would prefer the term "stack-based buffer overrun" to discriminate it from the ordinary programming error where you just run out of stack space, which usually has no security-related consequences and just leads to a crash. Although I agree that there is no inherent buffer overrun visible in the given piece of code. – Niklas B. Jun 15 '14 at 19:58
  • @NiklasB. I agree. As far as the person taking the test knows, that piece of code could be run on a system that does not have a stack and simply explodes when UB occurs. :) – Shoe Jun 15 '14 at 20:04