14

I have the following C code:

#include <stdint.h>
#include <stdio.h>

int i;
uint64_t a[] = { (uint64_t)&i, (uint64_t)&i + 0x8000000000000000 };

int main() {
    printf("%p %llx %llx\n", &i, a[0], a[1]);
}

If I compile this (as C or as C++) with Microsoft Visual Studio Community 2015 and then run it, the output is similar to the following:

013E9154 13e9154 13e9154

It seems that the code + 0x8000000000000000, which I expected to set the high bit of a[1], has been silently ignored.

However, if I move the initialization of a inside main, the output is what I would expect:

00179154 179154 8000000000179154

With a global, why is the addition being silently ignored? Should the attempted addition actually set the high bit of a[1] or should it cause a compiler error?

Interestingly, if + 0x8000000000000000 in the above code is replaced by | 0x8000000000000000, I get "error C2099: initializer is not a constant".

Edit: A similar issue can occur even in the absence of casts. Compiled for x64, the following code prints the same value (e.g. 000000013FB8D180) three times:

#include <stdio.h>

int i;
int * a[] = { &i, &i + 0x100000000 };

int main() {
    printf("%p %p %p\n", &i, a[0], a[1]);
}
user200783
  • 13,722
  • 12
  • 69
  • 135
  • I'm going to bet that the object module ABI format used in MS-Windows uses only a 32 bit offset for a symbol reference, and the compiler is not smart enough to realize this limitation, and just blindly spits it out. – Sam Varshavchik Oct 07 '16 at 13:03
  • @GillBates - I have now confirmed that the same thing happens when compiled as C++. – user200783 Oct 07 '16 at 13:04
  • 2
    Looks like a compiler bug to me. – Jabberwocky Oct 07 '16 at 13:07
  • Why was this question downvoted ? What happens here is that the value after the `+ ` is silently transformed into a 32 bit integer for whatever reason by stripping off the upper 32 bits. `+ 0x80000000` works fine. – Jabberwocky Oct 07 '16 at 13:08
  • 2
    the `0x8000000000000000` constant should actually be tagged as unsigned long long, i.e. `0x8000000000000000ULL` – tofro Oct 07 '16 at 13:11
  • @tofro maybe, but it doesn't help either. – Jabberwocky Oct 07 '16 at 13:12
  • well actualy the compiler should warn that's an integer constant out of range – tofro Oct 07 '16 at 13:13
  • 2
    Looks like a bug. – Eugene Sh. Oct 07 '16 at 13:24
  • 2
    I can reproduce this on my machine. This is very strange. – NathanOliver Oct 07 '16 at 13:24
  • What about some different constants? – Eugene Sh. Oct 07 '16 at 13:26
  • Reproductible with Visual Studio 2015 using either the 32 bit compiler or the 64 bcompiler. For me it's a compiler bug. – Jabberwocky Oct 07 '16 at 13:26
  • @MichaelWalz _"the value after the + is silently transformed into a 32 bit integer"_. In fact, it seems the values on _both sides_ of the `+` are truncated to 32-bits - for example, using `+ 0x80000000` and compiling for x64: `000000013FBBC180 13fbbc180 bfbbc180`. – user200783 Oct 07 '16 at 13:26
  • @user200783 yes, you are right. – Jabberwocky Oct 07 '16 at 13:27
  • Maybe write it as 0x8000000000000000ULL – Sven Nilsson Oct 07 '16 at 13:29
  • A qualified guess: the VS standard C library has crappy standard compliance. It does not implement all of C99 properly. You can get similar weird bugs when using Mingw - which in turn uses Microsoft's library. With Mingw I get "unknown conversion type character 'l' in format", suggesting that it doesn't know what `%ll` means. Now if I correct the format string to what it should be, namely `printf("%p %" PRIx64 " %" PRIx64 "\n", (void*)&i, a[0], a[1]);` then I get a nonsense warning "ISO C does not support the 'I64' ms_printf length modifier". Bad, non-compliant library seems to be the reason. – Lundin Oct 07 '16 at 14:09
  • @SvenNilsson Doesn't matter, C will implicitly pick a "large enough" type. See the hex constants table in 6.4.4.1. – Lundin Oct 07 '16 at 14:12
  • The fact that changing `+` to `|` makes the compiler error out hints at the nature of the problem. Relocation. Relocation can only handle adding/subtracting constants to/from addresses. For one reason or another it appears to be done in 32 bits and not 64 bits. There definitely are no 64-bit relocations in 32-bit PE executables. Btw, in 64-bit mode most memory operands (in instructions) still contain 32-bit offsets. – Alexey Frunze Oct 07 '16 at 14:25
  • Could this be a case of "wrap"? `(uint64_t)&i + 0x8000000000000000` will try add `0x8000000000000000 * sizeof(int)` to `&i`, i.e. 4 times the constant value, i.e. `<<2`, which shifts out the 8. – Paul Ogilvie Oct 07 '16 at 14:53
  • 1
    I'd consider this a bug in VC. GCC produces the expected output. – alk Oct 07 '16 at 14:54
  • I tried on VC2008 and indeed, the static initialization fails and the dynamic initialization is correct: definitly a bug. – Paul Ogilvie Oct 07 '16 at 15:06

2 Answers2

1

The initializer

(uint64_t)&i + 0x8000000000000000

isn't a valid constant expression in C. It is neither an arithmetic constant expression which only allows integer constants, floating constants, enumeration constants, character constants, and sizeof expressions as operands; nor an address constant which doesn't allow casts to integer types.

That said, I'd expect Visual Studio to generate "error C2099: initializer is not a constant" like it does with | 0x8000000000000000.

I'm not sure about C++, though.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
  • Thanks. If the initializer _"isn't a valid constant expression in C"_, is there a valid way to obtain the same value? i.e. to get "the value of `&i` but with the top bit set"? – user200783 Oct 07 '16 at 13:45
  • @user200783 With 64-bit pointers, `(char*)&i + 0x8000000000000000` should work. – nwellnhof Oct 07 '16 at 13:51
  • 1
    Why is it not a constant expression? According to 6.6/3: "Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated." This expression contains none of those operators. – Lundin Oct 07 '16 at 13:56
  • @Lundin This isn't the only constraint on constant expressions. See 6.6/7-9. – nwellnhof Oct 07 '16 at 13:58
  • @nwellnhof Actually, as far as I can tell, that section is exactly what makes this a constant expression: "an address constant for a complete object type plus or minus an integer constant expression". – Lundin Oct 07 '16 at 14:01
  • @Lundin AFAIU, only pointer casts are allowed in address constants. – nwellnhof Oct 07 '16 at 14:12
  • 1
    6.6/9 further below defines an address constant as: "An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type." – Lundin Oct 07 '16 at 14:16
  • @Lundin But the following sentence reads: *The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant.* IMO, this excludes all other operators. – nwellnhof Oct 07 '16 at 14:20
  • But that part makes it clear that `(uint64_t)&i` is an address constant. And as previously cited, "an address constant for a complete object type plus or minus an integer constant expression is a valid constant expression. So I believe your answer is incorrect". – Lundin Oct 07 '16 at 14:22
  • @Lundin AFAIU, a "pointer cast" is a cast from one pointer type to another, not to an arithmetic type. – nwellnhof Oct 07 '16 at 14:41
1

None of the initializers used in

uint64_t a[] = { (uint64_t)&i, (uint64_t)&i + 0x8000000000000000 };

are eligible constant expressions. The pedantic definition of constant expression in C does not allow casting pointer values to integer types, even if the pointer values satisfies requirements for address constant. Which means that formally (uint64_t)&i is already illegal in this context.

However, this compiler apparently accepts (uint64_t)&i in this context as an extension.

After that the fact that it complains when + is replaced with | operator is probably rooted directly in the language specification

6.6 Constant expressions

7 More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:

— an arithmetic constant expression,

— a null pointer constant,

— an address constant, or

— an address constant for an object type plus or minus an integer constant expression.

Again, this is not an exact match, since the above wording allows adding fixed offset to address constants only, but for a compiler that accepts (uint64_t)&i as a constant expression in this context it wouldn't be unusual to continue to apply the "plus or minus" restriction. The ability to add something to (or subtract something from) an address constant in C is defined by the capabilities of loaders that perform address relocation at load time. Loaders can add or subtract, but they cannot perform bitwise operations on addresses.

And, finally, the fact that it has no effect at run time is apparently caused by the limitations of the loader, which is responsible for implementing C-style initialization of statics at startup time.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • I wonder if VS behaves correctly if you cast to an integer after adding the offset: `(uint64_t) (&i + 0x80000...)`. That way you're using address constants while you're doing arithmetic. – Haldean Brown Oct 07 '16 at 16:36