9

I am always reading that pointer arithmetic is defined as long as you don't leave the bounds of the array. I am not sure I completely understand what this means and I was a little worried. Hence this question.

Suppose I start with a pointer to the beginning of an array:

int *p = (int*) malloc(4 * sizeof(int));

Now I create two new pointers that lie outside the bounds of the array:

int *q = p + 10;
int *r = p - 2;

Now the pointers q-10, q-9, ..., r+2, r+3, and so on all lie inside the bounds of the array. Are they valid? For example, is r[3] guaranteed to give the same result as p[1]?

I have done some testing and it works. But I want to know if this is covered by the usual C specifications. Specifically, I am using Visual Studio 2010, Windows, and I am programming in native C (not C++). Am I covered?

Mysticial
  • 464,885
  • 45
  • 335
  • 332
a06e
  • 18,594
  • 33
  • 93
  • 169

3 Answers3

9

What you're doing works on the implementation you're using, as well as most popular implementations, but it's not conforming C. As chris cited,

§6.5.6/8: If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined

The fact that it's undefined will probably become increasingly important in the future, with more advanced static analysis allowing compilers to turn this kind of code into fatal errors without incurring runtime cost.

By the way, the historical reason for subtracting pointers not within the same array being undefined is segmented memory (think 16-bit x86; those familiar with it will want to think of the "large" memory model). While pointers might involve a segment and offset component, a compiler could do the arithmetic just on the offset component to avoid runtime cost. This makes arithmetic between pointers not in the same segment invalid since the "high part" of the difference is lost.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • In addition to issues regarding x86-style segmentation, it may also be helpful for compilers--especially in some "troubleshooting" scenarios, to have each "pointer" actually hold three addresses--the start and end of the allocated region of which the pointed-to- object is a part, as well as a pointer to the object itself, and use such information to trap code which would perform invalid pointer computations. Trapping erroneous pointer computations when they occur can often greatly facilitate the tracking down of pointer-related bugs, and the Standard doesn't want to forbid that. – supercat Jun 24 '15 at 20:28
  • @supercat: Absolutely. Advanced pointer representations aimed at helping to diagnose/catch usage errors, or even make the system provably memory-safe (this is possible!), are great reasons for maintaining the C language's restrictions on pointer arithmetic. – R.. GitHub STOP HELPING ICE Jun 24 '15 at 22:22
  • IMHO, the proper thing for the Standard to do in many such cases would be to define __STDC_* macros which would indicate what guarantees the compiler will or won't provide as presently configured. Since many compilers have options that can guarantee behaviors in cases not required by the Standard, being able to precede existing code that relies upon such semantics with `#if (__STDC_GUARANTEES && !__STDC_DIRECT_LINEAR_POINTERS) #error This code requires direct linear pointer semantics. #endif` would ensure that moving to a new C17 compiler would not cause the code to behave erroneously. – supercat Jun 26 '15 at 16:05
  • It may be that the only way to make the code *usable* with a new compiler would be to rewrite it so as not to require such semantics, but the need for rewrite would become apparent when it arose. If within the useful lifetime of the code it never becomes necessary to use it with a compiler that can't support such semantics, rewriting the code for compatibility with such compilers impose huge costs but negative benefit (since any mistakes in the rewrite could add bugs to code whose behavior would otherwise have been correct when evaluated using the specified semantics). – supercat Jun 26 '15 at 16:07
  • @supercat: There's already a way to get compile-time errors when the application requires munging pointers as integers and the implementation doesn't support it: the cast to `uintptr_t` is an error because the (optional) type `uintptr_t` is not defined. Of course there's a theoretical case where the type/conversion is defined but not a flat linear mapping, so if that would break your application too you need to deal with it in some other way... – R.. GitHub STOP HELPING ICE Jun 26 '15 at 18:23
  • Although I don't think 8086 compilers happened to declare a type named `uintptr_t`, there's no reason they couldn't have done so. On most x86 compilers not using mixed-memory-mode code, if `ptr` is an `int*`, (ptr-1)+1 would likely yield `ptr` when given any pointer to a valid object, but if `ptr` points to the start of a segment, `ptr-1` would compare greater than `ptr`. I wouldn't call 8086 a "theoretical" architecture. It wasn't an especially good fit for C, since one of the keys to writing efficient 8086 code is to use 16-bit segment-only pointers and C has no such concept, but... – supercat Jun 26 '15 at 20:10
  • ...I'd say the concept behind the segment:offset pointers is having a reappearance in some JVM implementations (which will access byte 123 of an object whose reference is stored as a 32-bit value 45678 by computing 45678*8+123). In any case, if one has existing code that works on implementations where, given `char *x,*y`, the expression `x+(y-x);` will always yield `y`, allowing such code to specify that it should only compile on platforms which guarantee such semantics would seem better than blindly hoping nobody tries to migrate the code without rewriting it. – supercat Jun 26 '15 at 20:20
5

According to the C11 standard, §6.5.6/8 (I put in the first part for context):

When an expression that has integer type is added to or subtracted from a pointer
...
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Therefore, a result that is outside of the array and not one past the end is undefined behaviour.

chris
  • 60,560
  • 13
  • 143
  • 205
  • 5
    Which is funny because `p + 2 - 2` is defined, but `p - 2 + 2` is undefined. So much for associativity... – Mysticial Aug 24 '12 at 03:52
  • 4
    @Mysticial: Indeed. By the way, this brings up an easy-to-make error. Code like `ptr + index - base_index` is wrong and invokes UB if `index` goes outside the bounds of the pointed-to array. The alternate forms `ptr + (index - base_index)` or `&ptr[index-base_index]` are usually what you need. I've made this mistake a number of times myself. – R.. GitHub STOP HELPING ICE Aug 24 '12 at 03:55
  • @Mysticial Exactly! That's why I thought it was weird. – a06e Aug 24 '12 at 03:56
  • @R.. But this mistake has resulted in an actual error? Or just, say, a warning? – a06e Aug 24 '12 at 03:57
  • @becko I routinely break this rule when I do manual loop-unrolling. But the pointers that I pass in are typically a small portion of a much larger memory section. – Mysticial Aug 24 '12 at 04:00
  • 3
    Not that exact issue, but a similar one has led to a major error. I was mistakenly adding a moderately over-large offset to a base pointer to get an "end pointer", then looping as long as the pointer was less than the end pointer. The code worked fine on i386-linux running native on 32-bit processors (where pointers in the 3-4gb range never occur) and on x86_64 processors, but failed in i386 code running on a 64-bit kernel, since the stack got put very close to 0xffffffff and the addition overflowed. The result was a bad crash that was difficult to track down. – R.. GitHub STOP HELPING ICE Aug 24 '12 at 04:00
  • Here's a link to the fix for the bug I was talking about: http://git.musl-libc.org/cgi-bin/cgit.cgi?url=musl/commit/&id=914949d321448bd2189bdcbce794dbae2c8ed16e – R.. GitHub STOP HELPING ICE Aug 24 '12 at 04:04
  • +1, but I am marking @R.. answer, because it speaks of the general implementation I am using, which was my actual question. – a06e Aug 24 '12 at 04:04
-1

"Yes" the conditions you mentioned are covered in specifications.


    int *r = p - 2; 

r is outside bounds of array p, the evaluation results in allocation of position to r, 2 int positions behind/before the address of p.

`r[3]` is simply the "4th" int position after the address of r
rohank
  • 81
  • 1
  • 9