20

Suppose on my platform sizeof(int)==sizeof(void*) and I have this code:

printf( "%p", rand() );

Will this be undefined behavior because of passing a value that is not a valid pointer in place of %p?

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 4
    The standard does state *If a conversion specification is invalid, the behavior is undefined*. – cnicutar Jul 27 '12 at 12:57
  • I guess this boils down to a question of how special the pointers really are. I'm very curious to see a good answer. – Sergey Kalinichenko Jul 27 '12 at 12:58
  • "`p` The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner." Well a `void*` can't be dereferenced anyway, so it doesn't have to be dereferenceable, but I'd think it's implementation defined. – BoBTFish Jul 27 '12 at 13:01
  • Even if this was valid in thoery, it would still require a cast: `printf( "%p", reinterpret_cast(rand()) );` – MSalters Jul 27 '12 at 13:52
  • @MSalters: you mean `printf("%p", (void *) rand());`. Notice the question is tagged as `C`. – ninjalj Jul 27 '12 at 14:09
  • @ninjalj The question has `c++` and `reinterpret-cast` tags? EDIT: just checked the question edit history. Sorry ninjalj, the tags were changed and then changed back. – BoBTFish Jul 27 '12 at 14:12

4 Answers4

20

To expand upon @larsman's answer (which says that since you violated a constraint, the behavior is undefined), here's an actual C implementation where sizeof(int) == sizeof(void*), yet the code is not equivalent to printf( "%p", (void*)rand() );

The Motorola 68000 processor has 16 registers which are used for general computation, but they are not equivalent. Eight of them (named a0 through a7) are used for accessing memory (address registers) and the other eight (d0 through d7) are used for arithmetic (data registers). A valid calling convention for this architecture would be

  1. Pass the first two integer parameters in d0 and d1; pass the rest on the stack.
  2. Pass the first two pointer parameters in a0 and a1; pass the rest on the stack.
  3. Pass all other types on the stack, regardless of size.
  4. Parameters passed on the stack are pushed right-to-left regardless of type.
  5. Stack-based parameters are aligned on 4-byte boundaries.

This is a perfectly legal calling convention, similar to calling conventions used by many modern processors.

For example, to call the function void foo(int i, void *p), you would pass i in d0 and p in a0.

Note that to call the function void bar(void *p, int i), you would also pass i in d0 and p in a0.

Under these rules, printf("%p", rand()) would pass the format string in a0 and the random number parameter in d0. On the other hand, printf("%p", (void*)rand()) would pass the format string in a0 and the random pointer parameter in a1.

The va_list structure would look like this:

struct va_list {
    int d0;
    int d1;
    int a0;
    int a1;
    char *stackParameters;
    int intsUsed;
    int pointersUsed;
};

The first four members are initialized with the corresponding entry values of the registers. The stackParameters points to the first stack-based parameters passed via the ..., and the intsUsed and pointersUsed are initialized to the number of named parameters which are integers and pointers, respectively.

The va_arg macro is a compiler intrinsic which generates different code based on the expected parameter type.

  • If the parameter type is a pointer, then va_arg(ap, T) expands to (T*)get_pointer_arg(&ap).
  • If the parameter type is an integer, then va_arg(ap, T) expands to (T)get_integer_arg(&ap).
  • If the parameter type is something else, then va_arg(ap, T) expands to *(T*)get_other_arg(&ap, sizeof(T)).

The get_pointer_arg function goes like this:

void *get_pointer_arg(va_list *ap)
{
    void *p;
    switch (ap->pointersUsed++) {
    case 0: p = ap->a0; break;
    case 1: p = ap->a1; break;
    case 2: p = *(void**)get_other_arg(ap, sizeof(p)); break;
    }
    return p;
}

The get_integer_arg function goes like this:

int get_integer_arg(va_list *ap)
{
    int i;
    switch (ap->intsUsed++) {
    case 0: i = ap->d0; break;
    case 1: i = ap->d1; break;
    case 2: i = *(int*)get_other_arg(ap, sizeof(i)); break;
    }
    return i;
}

And the get_other_arg function goes like this:

void *get_other_arg(va_list *ap, size_t size)
{
    void *p = ap->stackParameters;
    ap->stackParameters += ((size + 3) & ~3);
    return p;
}

As noted earlier, calling printf("%p", rand()) would pass the format string in a0 and the random integer in d0. But when the printf function executes, it will see the %p format and perform a va_arg(ap, void*), which will use get_pointer_arg and read the parameter from a1 instead of d0. Since a1 was not initialized, it contains garbage. The random number you generated is ignored.

Taking the example further, if you had printf("%p %i %s", rand(), 0, "hello"); this would be called as follows:

  • a0 = address of format string (first pointer parameter)
  • a1 = address of string "hello" (second pointer parameter)
  • d0 = random number (first integer parameter)
  • d1 = 0 (second integer parameter)

When the printf function executes, it reads the format string from a0 as expected. When it sees the %p it will retrieve the pointer from a1 and print it, so you get the address of the string "hello". Then it will see the %i and retrieve the parameter from d0, so it prints a random number. Finally, it sees the %s and retrieves the parameter from the stack. But you didn't pass any parameters on the stack! This will read undefined stack garbage, which will most likely crash your program when it tries to print it as if it were a string pointer.

Raymond Chen
  • 44,448
  • 11
  • 96
  • 135
13

C standard, 7.21.6.1, The fprintf function, states just

p The argument shall be a pointer to void.

By Appendix J.2, this is a constraint, and violating a constraint causes UB.

(Below is my previous reasoning why this should be UB, which was too complicated.)

That paragraph does not describe how the void* is retrieved from the ..., but the only way that the C standard itself offers for this purpose is 7.16.1.1, The va_arg macro, which warns us that

if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined

If you read 6.2.7, Compatible type and composite type, then there's no hint that void* and int should be compatible, regardless of their size. So, I'd say that since va_arg is the only way to implement printf in standard C, the behavior is undefined.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Don't know where to look in the standard, but isn't storing a pointer in a large enough integer type and then storing it back again guaranteed to be safe? And as the integer type in discussion is not bigger than the pointer, converting the integer to a pointer should be safe, no? – BoBTFish Jul 27 '12 at 13:14
  • @BoBTFish: 7.16.1.1 explicitly gives UB for non-compatible types, and "big enough" is not a sufficient condition for two types to be compatible. E.g, by 6.2.5, "`char` [shall] have the same range, representation, and behavior as either `signed char` or `unsigned char`", but "[i]rrespective of the choice made, `char` is a separate type from the other two and is not compatible with either." – Fred Foo Jul 27 '12 at 13:21
  • 1
    The language specification does not need to describe how the `void*` is retrieved from the `...`. The fact that you passed an `int` when the specification says "shall be a pointer to `void`" means that you violated a constraint and have triggered UB. – Raymond Chen Jul 27 '12 at 13:48
  • `va_args` aside, converting between integers and pointers *is* allowed. From my c11 final draft, `6.3.2.3.5`:"An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined..." ("previously specified" is null pointer constant). The type `intptr_t` is specifically provided as an integer type that *will* work correctly. Still doesn't answer the question about printing an invalid pointer though. But implementation defined is not undefined. – BoBTFish Jul 27 '12 at 13:57
  • @RaymondChen: good point. Looked up the reference for that and shortened the answer. – Fred Foo Jul 27 '12 at 14:02
  • @BoBTFish: yes, converting is allowed, but in a `...` there's no possibility for a conversion because the arguments don't have types to convert *to* until `va_arg` is called. – Fred Foo Jul 27 '12 at 14:03
5

Yes, it's undefined. From C++11, 3.7.4.2/4:

The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.

with a footnote:

On some implementations, it causes a system-generated runtime fault.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • But what is "using" a pointer value? Doesn't that involve dereferencing? – Fred Foo Jul 27 '12 at 13:06
  • 2
    @larsmans: No, deferenencing it is called "dereferencing", not "using". Using the value of an object means that it appears in an expression where an _rvalue_ is required; for example, as a function argument. – Mike Seymour Jul 27 '12 at 13:15
-2

%p is just a output format specification for printf. It doesn't need to dereference or validate the pointer in any way, although some compilers issue a warning if the type is not a pointer:

int main(void)
{
    int t = 5;
    printf("%p\n", t);
}

Compilation warning:

warning: format ‘%p’ expects argument of type ‘void*’, but argument 2 has type ‘int’ [-Wformat]

Outputs:

0x5
  • 1
    `%p` is also used to control how the type is extracted from the va_args at the other side. Exctracting from `va_args` wrongly is undefined behaviour. – Flexo Jul 27 '12 at 13:44