-6

The below example should crash when calling look_back_1() or look_back_2(). Reason: when negating an unsigned variable the result should remain unsigned.

#include <stdio.h>


int look_back_1(int *arr, unsigned int nmElems, unsigned long dist)
{
    int *elem = arr + nmElems;
    elem += -dist;
    return (*elem);
}


int look_back_2(int *arr, unsigned int nmElems, unsigned int dist)
{
    int *elem = arr + nmElems;
    elem += -dist;
    return (*elem);
}


int main(int argc, char **argv)
{
    int arr[100] = { 0, };
    printf("1. %d\n", look_back_1(arr, 100, 1)); //       <NEEDS TO CRASH, BUT WORKS????>>
    printf("2. %d\n", look_back_2(arr, 100, 1)); //       <<CRASH!!!!!>>
}

GCC 4.5 crashes in each function call when doing array out of bound access. The compiler emit the NEG opcode for both cases.

GCC 6.1 or Clang will only crash when calling the int version. But they both avoids crashing when they emit the SUB opcode for the unsigned long version.

Are they allowed to do so?

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
Tal
  • 1,759
  • 1
  • 12
  • 22
  • 9
    There is no such thing as "correctly crashing". - what you have is undefined behaviour. –  Jul 19 '17 at 13:01
  • 2
    The compiler should never crash and it is unlikely that it does. What do you mean by "crashes"? The addition shouldn't fail. However, you are using the function to access out of bounds of the array. That can lead to the program crashing when run. Print the addresses to see why disaster ensues. – Jonathan Leffler Jul 19 '17 at 13:02
  • Is there an 'off-by-1' issue? Without compiling the code I can't tell, but both versions look like they are trying to access arr[99], which should be available to either function. If you fill arr with the value of each position (e.g. arr[0]=0, arr[50]=50, arr[99]=99) does it print the correct value? – Neil Jul 19 '17 at 13:04
  • thanks Neil, I've corrected the question. – Tal Jul 19 '17 at 13:05
  • You might want to look up the definition of the word "**undefined**", like in "undefined behaviour". Your problem is **not** the negation! Also don't spam tags. One language per question, C **or** C++. – too honest for this site Jul 19 '17 at 13:05
  • 1
    For fullest understanding, it might be important to know whether your code is compiled as 64-bit. Are you using a 64-bit OS? Does the compilation command-line have "-m32"? What is the size of `unsigned long` and `unsigned int`? You can [edit] your question to mention these details. – anatolyg Jul 19 '17 at 13:05
  • 1
    I'm wondering whether you understand what unary `operator-` does to `unsigned` values. It is well-defined and does not create a negative value. – MSalters Jul 19 '17 at 13:07
  • [Pointer arithmetic and integral promotion](https://stackoverflow.com/q/20649734/440558) is a related question. – Some programmer dude Jul 19 '17 at 13:08
  • @anatolyg, is I'm talking about x64 – Tal Jul 19 '17 at 13:10
  • @Olaf, different compilers emit different results. Both Old-GCC & VC++ will use NEG on both functions. – Tal Jul 19 '17 at 13:14
  • 1
    Are you asking if the compiler is allowed to create a program that doesn't crash? If so, the answer is always yes. Undefined behavior can do anything, including not crashing. – interjay Jul 19 '17 at 13:15
  • @Tal: That is completely unrelated to my comment. – too honest for this site Jul 19 '17 at 13:18
  • 1
    "loop_back_1() will treat the unsigned long 'dist' value as a signed long value.". No, it won't. -dist still has the type unsigned long. – Art Jul 19 '17 at 13:41
  • At least I'm no longer wondering. – MSalters Jul 19 '17 at 13:44
  • @Art, see https://godbolt.org/g/GbNysd – Tal Jul 19 '17 at 13:45
  • @Tal see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf I don't care what code some specific compiler generates on undefined behavior. – Art Jul 19 '17 at 13:54
  • @tal besides. that's a perfectly valid instruction to generate an unsigned long. – Art Jul 19 '17 at 13:55
  • Rolled back. You are not allowed to edit a question if that leaves an answer without context! Read [ask] and take the [tour] if you are not aware of site-rules. – too honest for this site Jul 19 '17 at 14:21

2 Answers2

4

[Edit] This is the answer to a previous version of the question, which showed the problem in action when calling these functions with argument dist==1

-(unsigned long)1 is well-defined and wraps around. It's just ULONG_MAX. For the same reason, -(unsigned int) is UINT_MAX.

Pointer arithmetic outside array bounds causes Undefined Behavior, so it's perfectly reasonable for GCC to just ignore the possibility. They can treat an pointer on x64 as just a 64 bit integer with wrap-around, for instance. Adding a 64 bits ULONG_MAX to a 64 bits pointer with wrap-around just decreases the pointer by -1, that's how wrap-around works. Adding a 32 bits UINT_MAX points nowhere near your int[100].

So, the behavior you see is one completely valid consequence of Undefined Behavior. It however is totally unreliable. An optimizer may know that you can't add more than the maximum number of elements permitted in an array (which for 4 byte ints on a 64 bit platform would be 2^62), and make assumptions from there on.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Please ignore the undefined example I've supplied. can you explain what is undefined in both functions when compiled as imported function? godbolt.org/g/gxNN1W – Tal Jul 19 '17 at 13:22
  • @Tal adding UINT_MAX to a pointer is undefined unless that pointer points to an array with UINT_MAX elements. Which your pointer doesn't. – Art Jul 19 '17 at 13:23
  • @Tal: That question doesn't make sense. "Undefined Behavior" is a term in the C++ Standard. "Imported function" is not. In general, the Standard defines the term in relation to whole programs given specific inputs, although there is a subset of programs that will exhibit Undefined Behavior regardless of input. – MSalters Jul 19 '17 at 13:25
  • I hope now it clearer I'm not talking about "undefined behaviour" – Tal Jul 19 '17 at 13:49
  • 1
    @tal it still is undefined behavior. You are adding X to a pointer to an array with Y elements where X > Y + 1. That is undefined behavior. C11 standard §6.5.6 point 8. – Art Jul 19 '17 at 13:58
  • @Art, Thanks, now I got it – Tal Jul 19 '17 at 14:00
  • @Tal: To notice the difference in the two functions, you must use arguments that cause Undefined Behavior. Calling the function with valid arguments (e.g. `~0UL`) will not lead to a noticeable difference. – MSalters Jul 19 '17 at 14:01
  • @Art Can you explain why int * behaves differently than char* ? See: https://godbolt.org/g/SzB89A – Tal Jul 20 '17 at 08:32
0

Looking at your "godbolt" disassembly, the difference is quite easy. You're compiling for a platform which is natively 64 bits, with unsigned int 32 bits and unsigned long 64 bits. That is to say that math is natively modulo 2^64, which exactly matches matches the behavior of unsigned long. But for unsigned int, one extra instruction is needed. This is a subtle MOV instruction from a 32 bits register to itself (!). The reason for this instruction? It clears the upper 32 bits of the 64 bit result, which is what you need for the "modulo 2^32" behavior.

This is efficient, and quite smart. It may give unexpected results for code exhibiting Undefined Behavior, but you shouldn't have expectations for those cases anyway.

MSalters
  • 173,980
  • 10
  • 155
  • 350