29

Pointers in C++ may in general only be compared for equality. By contrast, less-than comparison is only allowed for two pointers that point to subobjects of the same complete object (e.g. array elements).

So given T * p, * q, it is illegal in general to evaluate p < q.

The standard library contains functor class templates std::less<T> etc. which wrap the built-in operator <. However, the standard has this to say about pointer types (20.8.5/8):

For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.

How can this be realised? Is it even possible to implement this?

I took a look at GCC 4.7.2 and Clang 3.2, which don't contain any specialization for pointer types at all. They seem to depend on < being valid unconditionally on all their supported platforms.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    Isn't it that this simply works because of the linear space of addresses provided by virtual memory? – jogojapan Nov 14 '12 at 13:50
  • There is no requirement that it be possible to create a standard template library given only what the C specification requires from the language itself. – David Schwartz Nov 14 '12 at 13:50
  • 2
    Huh, I had no idea that it was illegal to compare pointers in that way. – Rook Nov 14 '12 at 13:51
  • @jogojapan: Well, I imagine that GCC and Clang only target platforms on which this is true, and thus the naive comparison always works. But the standard is very explicit about not allowing arbitrary pointers to be ordered, so the question is how one can make sense of the much stricter requirements for `std::less`. – Kerrek SB Nov 14 '12 at 13:51
  • I think the question "Is it even possible to implement this?" cannot be answered in its current form: it is too broad. An implementation of this depends on the target platform. @Rook it is illegal to compare non-null pointers that do not point to the same array. – R. Martinho Fernandes Nov 14 '12 at 13:54
  • So are there any real platforms on which that wouldn't work? I can think of theoretical ways for that to fail (multiple disjoint address spaces), but does that *actually* happen? – harold Nov 14 '12 at 13:54
  • 2
    @R.MartinhoFernandes: Fair enough. If you want a more concrete question, consider this variation: "Is it possible to implement the standard library on targets where pointers do not form a global, total order?" – Kerrek SB Nov 14 '12 at 13:57
  • 1
    @harold, segmented architectures, 16-bit 80x86, for example. One can imagine that compiler uses only the offset part of a far pointer in <, >, etc. assuming no object crosses segment boundary, but less, etc. could well use the full 20-bit seg:offset. – chill Nov 14 '12 at 14:12
  • Is this whole idea that on some platforms it may not be possible to compare arbitrary pointers perhaps a legacy from times when the distinction between far pointers (including the segment selector) and near pointers (not including it) was still an important notion? Clearly if you deal with near pointers, you can only compare them if they belong to the same segment (which for example you could be sure of if they belonged to the same array). But since nowadays that distinction (on common platforms) isn't important any more, `std::less` can have a more relaxed definition. – jogojapan Nov 14 '12 at 14:13
  • Okay, I understand. Thanks. I've seen deleted answers hanging around, just didn't make the connection. – Pete Becker Nov 14 '12 at 14:37
  • 2
    @Rook: imho comparing pointers is not in general illegal, but in general undefined. – Zane Nov 16 '12 at 09:37
  • @Zane only if they point to locations in different objects. Comparint pointers within same object, i.e., in array or result offsetof is defined – Swift - Friday Pie Apr 14 '20 at 02:33

5 Answers5

27

Can pointers be totally ordered? Not in portable, standard C++. That's why the standard requires the implementation to solve the problem, not you. For any given representation of a pointer, it should be possible to define an arbitrary total ordering, but how you do it will depend on the the representation of a pointer.

For machines with a flat address space and byte addressing, just treating the pointer as if it were a similarly sized integer or unsigned integer is usually enough; this is how most compilers will handle comparison within an object as well, so on such machines, there's no need for the library to specialize std::less et al. The "unspecified" behavior just happens to do the right thing.

For word addressed machines (and there is at least one still in production), it may be necessary to convert the pointers to void* before the compiler native comparison will work.

For machines with segmented architectures, more work may be necessary. It's typical on such machines to require an array to be entirely in one segment, and just compare the offset in the segment; this means that if a and b are two arbitrary pointers, you may end up with !(a < b) && !(b < a) but not a == b. In this case, the compiler must provide specializations of std::less<> et al for pointers, which (probably) extract the segment and the offset from the pointer, and do some sort of manipulation of them.

EDIT:

On other thing worth mentionning, perhaps: the guarantees in the C++ standard only apply to standard C++, or in this case, pointers obtained from standard C++. On most modern systems, it's rather easy to mmap the same file to two different address ranges, and have two pointers p and q which compare unequal, but which point to the same object.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • I think we have a winner :-) – Kerrek SB Nov 14 '12 at 14:53
  • For the "do some sort of manipulation of them" I'd guess it's sufficient to have some ordering on the segments. If two pointers are in the same segment, compare addresses. If they are not, then p1 < p2 if segment(p1) < segment(p2). – Zane Nov 14 '12 at 14:54
  • I forgot about the power of converting to `void *` -- so here's a tangential question: When I convert a pointer to `uintptr_t`, is that in general not the same as if I first convert to `void *` and *then* to `uintptr_t`? – Kerrek SB Nov 14 '12 at 15:16
  • 2
    @Zane Maybe, if the compiler can guarantee that it always uses the same segment::offset for any specific object. At least on the old Intels, it was possible to access a given address with quite a number of different segment:offset combinations, and the most rigorous solution was to compare `segment*16+offset` (calculated using `long`). – James Kanze Nov 14 '12 at 15:43
  • @KerrekSB Good question. I don't know that it's guaranteed, but I can't conceive of a case where you would get something different. On machines where some pointers are smaller than `void*`, I suspect that the way the compiler would do the conversion would be to first convert to `void*`, then take the resulting bits. – James Kanze Nov 14 '12 at 15:45
  • Can you explain what a byte addressed machine is (or link to something; my cursory googling did not help)? – R. Martinho Fernandes Nov 14 '12 at 20:24
  • @James: Yep, I realize there can be segmented memory with overlaps. But in stack/heap management, I would expect that there is only one way to address a specific object, thus an object would also be addressed in the same way. At least I don't see how the compiler would get the information where to find the same object in a different segment. – Zane Nov 14 '12 at 20:45
  • @R.MartinhoFernandes A byte addressed machine is a machine where the normal hardware address designates a byte. In other words, almost all modern machines. As opposed to a word addressed machine, where the address designates a word, and you need special instructions, with additional bits in the address, to access individual bytes. (The most noted word addressed machine was the PDP-10, but if you go back far enough, almost all machines were word addressed. – James Kanze Nov 15 '12 at 08:26
  • 2
    @Zane Exactly. The C++ implementation only has to work for pointers you can get in pure C++; depending on the implementation, it's likely that just treating segment::offset as if they were a pair of appropriately sized unsigned integers would be sufficient. But there have been exceptions; I think some of the 8086 compilers supported a huge model which could result in different segment::offset pairs pointing to the same object in an array. – James Kanze Nov 15 '12 at 08:35
  • Thanks for the `mmap` comment. I'd thought before that would not be possible. – Zane Nov 16 '12 at 09:34
  • @JamesKanze this is old but I think you have a typo, you might mean "For **word** addressed machines (and there is at least one still in production)", instead of "For byte addressed machines..." – Stephen Lin Mar 05 '13 at 19:28
  • @StephenLin Yes. I'll fix it anyway. – James Kanze Mar 05 '13 at 19:35
  • @JamesKanze Hi, if you're still here, that comment about multiple ways to point to the same object should really be rolled into your answer. I may do this if I don't hear back, but of course feel free to roll it back. if I do. – Spencer Dec 16 '21 at 15:09
12

Is it possible to implement the standard library on targets where pointers do not form a global, total order?

Yes. Given any finite set you can always define an arbitrary total order over it.

Consider a simple example where you have only five possible distinct pointer values. Let's call these O (for nullptr), γ, ζ, χ, ψ1.

Let's say that no pair of two distinct pointers from the four non-null pointers can be compared with <. We can simply arbitrarily say that std::less gives us this order: O ζ γ ψ χ, even if < doesn't.

Of course, implementing this arbitrary ordering in an efficient manner is a matter of quality of implementation.


1 I am using Greek letters to remove subconscious notion of order that would arise due to familiarity with the latin alphabet; my apologies to readers that know the Greek alphabet order

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
  • OK, then the question is: can this arbitrary total order be made compatible with the subobject ordering of `<`? Would this actually be a requirement by the standard? – Kerrek SB Nov 14 '12 at 14:08
  • 1
    @Kerrek no, I have never seen such requirement (I [joked on the chat](http://chat.stackoverflow.com/transcript/10?m=1741420#1741420) before that the hypothetical Hell++ implementation could have std::less for pointers implemented with p > q and std::greater for pointers just delegate to std::less). – R. Martinho Fernandes Nov 14 '12 at 14:11
  • No, the standard doesn't require that `std::less` agrees with `<`. Yes, insofar as `<` totally orders pointers to subobjects of the same object, `std::less` can be made to agree with it within an object (no guarantee!). No, insofar as `<` can violate total ordering axioms between objects, `std::less` cannot be guaranteed to agree with `<`. – Yakk - Adam Nevraumont Nov 14 '12 at 14:12
  • @Kerrek I think that any partial order can be extended to a total order. Hopefully I am not misremembering my set theory classes. – R. Martinho Fernandes Nov 14 '12 at 14:17
  • @Yakk: To be precise, I mean it as follows: Given two comparable pointers `p` and `q`, is it required that `less()(p, q)` be equal to `p < q`? Also, I don't understand how to reconcile sentences 4 and 8 in the standard, i.e. 4 says that `less` implement `<`, but 8 says that it also be a total order on pointers... Does 8 constitute an exception to 4? – Kerrek SB Nov 14 '12 at 14:17
  • @R.MartinhoFernandes: Or your partial template specialisations? :-) – Kerrek SB Nov 14 '12 at 14:17
  • 1
    @KerrekSB `less<>( p, q )` must be equal to `p < q` when `p < q` is defined. Otherwise, all bets are off. – James Kanze Nov 14 '12 at 14:50
  • 2
    @JamesKanze -- I'm just looking at the draft standard, and I'm not certain that is strictly required. It would be asinine to not do that, I admit. Does the full standard have more strict wording? The draft standard either has 8 overriding 4, or 8 and 4 are inconsistent, or 8 tells you what must happen when 4s behavior is undefined: it does *not* specify which of these 3 options is true. (presuming consistent interpretations when there is ambiguity seems reasonable... but presuming reasonableness seems a stretch!) – Yakk - Adam Nevraumont Nov 14 '12 at 16:44
5

On most platforms with a flat address space, they can simply do a numerical comparison between the pointers. On platforms where this isn't possible, the implementer has to come up with some other method of establishing a total order to use in std::less, but they can potentially use a more efficient method for <, since it has a weaker guarantee.

In the case of GCC and Clang, they can implement std::less as < as long as they provide the stronger guarantee for <. Since they are the ones implementing the behavior for <, they can rely on this behavior, but their users can't, since it might change in the future.

Dirk Holsopple
  • 8,731
  • 1
  • 24
  • 37
5

The problem is segmented architectures, where a memory address has two parts: a segment and an offset. It's "easy enough" to turn those pieces into some sort of linear form, but that takes extra code, and the decision was to not impose that overhead for operator<. For segmented architectures, operator< can simply compare the offsets. This issue was present for earlier versions of Windows.

Note that "easy enough" is a systems programmer's perspective. Different segment selectors can refer to the same memory block, so producing a canonical ordering requires pawing through details of segment mapping, which is platform-dependent and may well be slow.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
1

I think there is a deeper concept missing from this discussion, that is that of the pointer provenance.

In principle you cannot compare pointers in general but you should be able to compare pointer that come (by arithmetic operations) from the same one. For example, pointers coming from a black box like different calls to new, cannot be compared reliably. (Here comparison applies to ordering but I guess strictly speaking it is also equality that is ill defined in this context, I am not sure. This would cover the mmap case above.)

So, here is my attempt of a (rather useless but conceptual) answer: comparison of pointers is total order in the domain of applicability of the order operator (i.e. when it is not undefined). On the bright side, yes, go ahead an compare pointers that belong to/come-from the same allocation or from a single block. After all, it must hold that p2 > p1 if T* p2 = p1 + 1;

This is analogous to what happens to container iterators in c++, if two iterators come from different containers it doesn't make sense to compare them.


EDIT: a take on this problem by Sean Parent, https://youtu.be/mYrbivnruYw?t=3526 . Paraphrasing (1) You can only compare pointers of the same containers [I think this is too strong, except for std::vector]. (2) Use std::less for pointers so it is only used for "representation" (for example to put in std::set). (3) Some compilers will complain about comparing (void?) pointers. (which I think is fine because void* doesn't have arithmetic).


Some related material: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm#pointer-provenance-in-c-and-provenance-within-allocated-regions

alfC
  • 14,261
  • 4
  • 67
  • 118