74

Suppose you have an array:

int array[SIZE];

or

int *array = new(int[SIZE]);

Does C or C++ guarantee that array < array + SIZE, and if so where?

I understand that regardless of the language spec, many operating systems guarantee this property by reserving the top of the virtual address space for the kernel. My question is whether this is also guaranteed by the language, rather than just by the vast majority of implementations.

As an example, suppose an OS kernel lives in low memory and sometimes gives the highest page of virtual memory out to user processes in response to mmap requests for anonymous memory. If malloc or ::operator new[] directly calls mmap for the allocation of a huge array, and the end of the array abuts the top of the virtual address space such that array + SIZE wraps around to zero, does this amount to a non-compliant implementation of the language?

Clarification

Note that the question is not asking about array+(SIZE-1), which is the address of the last element of the array. That one is guaranteed to be greater than array. The question is about a pointer one past the end of an array, or also p+1 when p is a pointer to a non-array object (which the section of the standard pointed to by the selected answer makes clear is treated the same way).

Stackoverflow has asked me to clarify why this question is not the same as this one. The other question asks how to implement total ordering of pointers. That other question essentially boils down to how could a library implement std::less such that it works even for pointers to differently allocated objects, which the standard says can only be compared for equality, not greater and less than.

In contrast, my question was about whether one past the end of an array is always guaranteed to be greater than the array. Whether the answer to my question is yes or no doesn't actually change how you would implement std::less, so the other question doesn't seem relevant. If it's illegal to compare to one past the end of an array, then std::less could simply exhibit undefined behavior in this case. (Also, typically the standard library is implemented by the same people as the compiler, and so is free to take advantage of properties of the particular compiler.)

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user3188445
  • 4,062
  • 16
  • 26
  • 15
    Who said that a pointer has to be an actual memory address? – Mad Physicist Mar 02 '21 at 06:29
  • AFAIK, A C program could be compiled for some hypothetical processor having just two words of memory, and then you hypothesis is false – Basile Starynkevitch Mar 02 '21 at 06:30
  • The example is just an example of why this might not be the case. The question about pointer arithmetic and comparison obviously stands for even weirder architectures. – user3188445 Mar 02 '21 at 06:32
  • 1
    Does this answer your question? [How can pointers be totally ordered?](https://stackoverflow.com/questions/13380063/how-can-pointers-be-totally-ordered) – 273K Mar 02 '21 at 06:38
  • 4
    @S.M. That's about ordering pointers to different objects. This question is just about pointers within the same array. – Barmar Mar 02 '21 at 06:40
  • 10
    @user3188445 From the non-authoritative but generally reliable [cppreference](https://en.cppreference.com/w/cpp/language/operator_comparison), in C++ "*If one pointer points to an element of an array, or to a subobject of the element of the array, and another pointer points one past the last element of the array, the latter pointer compares greater*". – dxiv Mar 02 '21 at 06:41
  • @Barmar Read the duplicated question carefully. Pointers of the same object/array are also mentioned. – 273K Mar 02 '21 at 06:42
  • @dxiv Yes, that link is very useful. Also nice to know that this works even for non-arrays, so for any pointer to an actual object, `p < p+1`. – user3188445 Mar 02 '21 at 06:43
  • 20
    Luckily it is guaranteed, because otherwise an awful lot of code out there would be broken. It's very common to see `for (int *p = array; p < array + SIZE; p++) do_stuff(*p);` – Nate Eldredge Mar 02 '21 at 06:44
  • @user3188445: any object can be considered as array of 1 element, so `&obj < &obj + 1`. – Jarod42 Mar 02 '21 at 08:35
  • But arithmetic of pointer is only guaranteed in `std::begin(arr), std::end(arr)`. so `arr < arr+SIZE + 1` is not necessary true (it is even UB). – Jarod42 Mar 02 '21 at 08:38
  • @Jarod42 pointer value contains virtual memory page number in descriptor table(it has a long real name) so `&something + 1` may inc this number to an absent page – Алексей Неудачин Mar 02 '21 at 08:44
  • @АлексейНеудачин `&obj + 1` is a valid but not dereferenceable pointer. – Caleth Mar 02 '21 at 08:51
  • @Caleth depends on what you mean. If page descriptor table doesn't even have a page with this number is it valid or not? – Алексей Неудачин Mar 02 '21 at 08:55
  • @АлексейНеудачин I mean the C++ standard defines what is and what isn't a valid pointer, and promises that you can compare valid pointers. On such a platform the implementation would have to deal with that possibility in the definition of `<` – Caleth Mar 02 '21 at 08:57
  • @Caleth it's a windows platform – Алексей Неудачин Mar 02 '21 at 09:03
  • 7
    @АлексейНеудачин The standard doesn't mention virtual addresses, or pages, or descriptor tables. If it says that `&obj < &obj + 1` must be true, then any compiler that doesn't do that (for any reason) is bugged. And, in practice, `<` comparison shouldn't read from the addresses being compared, so the pointer being invalid doesn't matter. – HolyBlackCat Mar 02 '21 at 17:55

6 Answers6

80

Yes. From section 6.5.8 para 5.

If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P.

Expression array is P. The expression array + SIZE - 1 points to the last element of array, which is Q. Thus:

array + SIZE = array + SIZE - 1 + 1 = Q + 1 > P = array

tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • 4
    Does this imply that you cannot create an implementation that puts an array at the top of the address space? Because (array= ((int*)0xFFFFFFFC))+ 1 might be 0x00000000? (32-bit address space, 4-byte int example) – Wyck Mar 02 '21 at 15:32
  • @Wyck, I guess that the compiler allowing creation of such an object wold be non-compliant with the latest C standards. – tstanisl Mar 02 '21 at 15:37
  • 7
    @Wyck, you might not be able to put anything at the top position of the address space, if I'm reading cppreference.com correctly: ["a pointer to an object that is not an element of an array is treated as if it were pointing to an element of an array with one element"](https://en.cppreference.com/w/c/language/operator_comparison) – ilkkachu Mar 02 '21 at 16:06
  • 7
    @ilkkachu: Objects whose address is not taken could be placed at the top of address space or at whatever physical address would match a null pointer's representation. Since most non-trivial programs will have at least two objects whose address is not taken, a requirement that any objects whose address is taken have to go elsewhere doesn't reduce the amount of practically useful storage. – supercat Mar 02 '21 at 16:12
  • 1
    @Wyck If the compiler cannot guarantee that no object, then it would need to ensure the an all-zeros pointer is greater than all other pointers. This is of course possible, but on most architectures would make implementing pointer comparisons more expensive, so I doubt any compiler would do that. – user3188445 Mar 02 '21 at 17:44
  • i do not see much sense in all this stuff to be fair. Does commitee of some guys accept that compiler or not i do not care – Алексей Неудачин Mar 02 '21 at 18:41
  • 11
    @Wyck - it doesn't prohibit such an implementation, _as long as it ensures that `<` is consistent with it_. – Toby Speight Mar 02 '21 at 20:15
  • 4
    @Wyck : You seem to be conflating the runtime values of the variables `array`, `SIZE`, `P`, and `Q` with actual virtual memory addresses. Sure, having pointers contain bit patterns identical to virtual memory addresses is an easy implementation, but it is not mandatory. As a concrete example, a pointer to a (16-bit) word at an odd address on an MC68000 cannot be directly dereferenced since [non-byte dereferencing an odd address on that architecture throws exceptions](http://mrjester.hapisan.com/04_MC68/Sect01Part06/Index.html). – Eric Towers Mar 02 '21 at 22:54
  • 3
    @EricTowers how is that a counterexample? A conforming C implementation just wouldn't allow objects to be created on odd addresses. We call that *alignment*. – user253751 Mar 03 '21 at 14:51
  • The standard does not require that the physical memory layout is first => last, last => first, or any other conceivable layout. Only that the math carried out on pointers behaves as though the layout is first => last. – jwdonahue Mar 03 '21 at 17:53
  • @Wyck, I don't think the standard says anything about memory address space. It only defines an abstract machine that behaves _as if_ pointers behave like linear addresses. I am fairly certain the standard leaves physical memory details to the system designers and compiler implementers. – jwdonahue Mar 03 '21 at 18:15
  • @user253751 : No conforming C implementation is required to enforce alignment and an implementation that enforces alignment makes driver writing stupidly difficult. – Eric Towers Mar 03 '21 at 20:27
  • 2
    @EricTowers: A conforming implementation is required to ensure that it allocates objects with an alignment that will be compatible with whatever means it uses to accesses them. If e.g. a hardware platform has a 32-bit load instruction that works with arbitrarily-aligned addresses, and a load-multiple instruction that only works with 32-bit-aligned addresses, an implementation could at its leisure either ensure that 32-bit objects are aligned and use both kinds of instructions to access them, or use only the former kind of instruction but then place objects with arbitrary alignment. – supercat Mar 03 '21 at 22:53
  • This is also why you can make the idiomatic C foreach loop: `for(int *a = arr; a < (&arr)[1]; ++a) printf("%d\n", *a);` – Steve Cox Mar 04 '21 at 09:42
  • @EricTowers They are *allowed* to enforce alignment, no? So, there is nothing that stops C from being implemented on a platform that enforces alignment. No problem whatsoever. Actually the vast majority of CPU architectures enforce alignment. – user253751 Mar 04 '21 at 11:05
  • @Wyck - You're conflating arrays and pointers. In the spec, an "array" is defined as a sequence of objects allocated somewhere in memory. The spec quoted here effectively says the array must be allocated such that there is at least one more byte between the end of the array and the end of the address space, meaning that `&array[_countof(array)] >= &array[0]` is guaranteed true. Even a new'ed object effectively points to one of these fundamental structures and thus, if you do `foo = new Foo[foo_count];`, then `&foo[foo_count] >= &foo[0]` or `foo+foo_count >= foo` are guaranteed true. – Aiken Drum Mar 09 '21 at 03:00
  • @tstanisl I think, assuming you agree, that adding something along the lines of what I just said in my comment above might help to explain what the spec is actually saying and trying to enforce, but in more concrete terms that might help readers understand it better. I'd edit your answer myself but I'm kind of out of the loop on the current spec and I don't want to say something I'm not 100% sure I have right. – Aiken Drum Mar 09 '21 at 03:05
  • @AikenDrum, to my understanding of C spec, there is no requirement on any extra byte after an array. All it says is that the implementation must guarantee that "foo+foo_count > count" if foo was created in a standard way (array decl, new, malloc(), etc). The details how it is enforced is a compiler's business. The allocator *may* select `&foo` in such way that `(uintptr_t)&foo + foo_count > (uintptr_t)&foo` but there are other means. The OP asked only to point the wording in the C spec. IMO, the answer addresses that. – tstanisl Mar 09 '21 at 07:07
  • @tstanisl - The OP did NOT ask "only to point to the wording of the C spec." Your answer is extremely terse and does very little to help all of the people beyond the OP who will eventually come to stack and look for answers to this question, because when people come to stack and ask a duplicate question, it gets closed and redirected to the existing question. Never mind, I'll make the edit myself if you can't be bothered. – Aiken Drum Mar 09 '21 at 08:42
22

C requires this. Section 6.5.8 para 5 says:

pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values

I'm sure there's something analogous in the C++ specification.

This requirement effectively prevents allocating objects that wrap around the address space on common hardware, because it would be impractical to implement all the bookkeeping necessary to implement the relational operator efficiently.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 5
    Note, that `array + SIZE` does not point to any element of `array`. It points to element just after the last one. – tstanisl Mar 02 '21 at 06:41
  • I think arrays are defined to their size + 1, but I'm not sure how to look that up. – Neil Mar 02 '21 at 06:43
  • I didn't notice that the question was specifically about the pointer just past the end of the array, I thought it was about any subscript. @tstanisl – Barmar Mar 02 '21 at 06:44
  • 19
    @Neil You're allowed to form a pointer just past the end, but not allowed to dereference it. – Barmar Mar 02 '21 at 06:45
  • This behaviour is apparently documented in C90 Sec. 6.3.6, as referred to in the [c-faq](http://c-faq.com/aryptr/non0based.html), but I think it's 6.4.6 in C99. – Neil Mar 02 '21 at 07:19
  • You like to the correct section of the language spec, but you quote the wrong part. The language about Q+1 exactly answers my question. – user3188445 Mar 02 '21 at 07:24
  • @user3188445 I realize that, see my above comment. – Barmar Mar 02 '21 at 07:24
  • 4
    The “bookkeeping necessary to implement the relational operator efficiently” is trivial: p < q, p = q, and p > q are equivalent to p−q < 0, p−q = 0, and p−q > 0, where p−q is computed in the width of the address space bits. As long as every supported object is less than half the size of the address space, p−q must fall in the right region. – Eric Postpischil Mar 02 '21 at 11:58
  • @EricPostpischil Good point. I recall similar logic used in TCP sequence number validation. It also depends on the maximum window size being less than half the sequence number space. – Barmar Mar 02 '21 at 12:02
  • Except for an implementation defined mapping between the C abstract machine and the physical hardware, I do not recall anything in the spec that requires pointer values bear any relation to memory addresses. The fact they do on some machines, is a matter of convenience for the system/compiler implementers. Arrays are a C abstraction that can be mapped in any conceivable way to whatever physical storage is available. Consider how a compiler might utilize a holographic cube for storage? `p++` will likely not result in a linear computation. – jwdonahue Mar 03 '21 at 18:08
  • @jwdonahue Nothing requires it except efficiency of implementation on traditional hardware. – Barmar Mar 03 '21 at 19:15
  • 1
    @jwdonahue I'm well familiar with non-traditional implementations, I used C on Lisp Machines. – Barmar Mar 03 '21 at 19:16
13

The guarantee does not hold for the case int *array = new(int[SIZE]); when SIZE is zero .

The result of new int[0] is required to be a valid pointer that can have 0 added to it , but array == array + SIZE in this case, and a strictly less-than test will yield false.

M.M
  • 138,810
  • 21
  • 208
  • 365
8

This is defined in C++, from 7.6.6.4 (p139 of current C++23 draft):

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5) the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.

(4.3) — Otherwise, the behavior is undefined.

Note that 4.2 explicitly has "<= n", not "< n". It's undefined for any value larger than size(), but is defined for size().

The ordering of array elements is defined in 7.6.9 (p141):

(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.

Which means the hypothetical element n will compare greater than the array itself (element 0) for all well defined cases of n > 0.

throx
  • 189
  • 3
  • 1
    That says you can create such a pointer, it doesn't say how comparison behaves with it. – user9876 Mar 03 '21 at 16:03
  • You're right. I assumed the OP was satisfied that array members were strongly ordered. Updated answer to cover this. – throx Mar 03 '21 at 22:05
  • Your addition still doesn't cover this. P+n points to a "hypothetical array element". Hypothetical meaning (in this case) "does not exist". P+n does NOT really point to an array element. So 4.1 cannot apply, since P+n does not point to an "element of the ... array". – user9876 May 05 '21 at 00:20
  • As noted in Richard Smith's answer below, this is covered in [basic.compound] as valid, and 4.1 explicitly does apply. – throx May 06 '21 at 01:25
5

The relevant rule in C++ is [expr.rel]/4.1:

If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.

The above rule appears to only cover pointers to array elements, and array + SIZE doesn't point to an array element. However, as mentioned in the footnote, a one-past-the-end pointer is treated as if it were an array element here. The relevant language rule is in [basic.compound]/3:

For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.

So C++ guarantees that array + SIZE > array (at least when SIZE > 0), and that &x + 1 > &x for any object x.

Richard Smith
  • 13,696
  • 56
  • 78
-8

array is guaranteed to have consecutive memory space inside. after c++03 or so vectors is guaranteed to have one too for its &vec[0] ... &vec[vec.size() - 1]. This automatically means that that what you're asking about is true
it's called contiguous storage . can be found here for vectors
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0944r0.html

The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size(). Presumably five more years of studying the interactions of contiguity with caching made it clear to WG21 that contiguity needed to be mandated and non-contiguous vector implementation should be clearly banned.

latter is from standard docs. C++03 I've guessed right.