constexpr and initialization of a static const void pointer with reinterpret cast, which compiler is right?

Question

Consider the following piece of code:

struct foo {
  static constexpr const void* ptr = reinterpret_cast<const void*>(0x1);
};

auto main() -> int {
  return 0;
}

The above example compiles fine in g++ v4.9 (Live Demo), while it fails to compile in clang v3.4 (Live Demo) and generates the following error:

error: constexpr variable 'ptr' must be initialized by a constant expression

Questions:

Which of the two compilers is right according to the standard?
What's the proper way of declaring an expression of such kind?

@KerrekSB "Perhaps the OP is the inventor of XML?" You got me... — 101010, Jun 25 '14 at 19:56
@ShafikYaghmour It may be used as another special "nullptr". So a pointer may have three state, valid pointer, nullptr, special state value. — Bryan Chen, Jun 26 '14 at 04:26
Another use case would be to initialize pointer constants that point to peripherals on microcontrollers. Those addresses are usually given as integer macros in device specific header files. — Pait, Feb 13 '15 at 19:24
Note that newer versions of gcc also show an error for this (at least gcc 7.3, probably since 7.0 looking at [this question](https://stackoverflow.com/questions/48212565/c-constexpr-does-not-work-with-reinterpret-cast?noredirect=1&lq=1)). — Matthijs Kooijman, Oct 02 '19 at 13:36

score 21 · Accepted Answer · edited Oct 02 '19 at 15:53

TL;DR

clang is correct, this is known gcc bug. You can either use intptr_t instead and cast when you need to use the value or if that is not workable then both gcc and clang support a little documented work-around that should allow your particular use case.

Details

So clang is correct on this one if we go to the draft C++11 standard section 5.19 Constant expressions paragraph 2 says:

A conditional-expression is a core constant expression unless it involves one of the following as a potentially evaluated subexpression [...]

and includes the following bullet:

— a reinterpret_cast (5.2.10);

One simple solution would be to use intptr_t:

static constexpr intptr_t ptr = 0x1;

and then cast later on when you need to use it:

reinterpret_cast<void*>(foo::ptr) ;

It may be tempting to leave it at that but this story gets more interesting though. This is know and still open gcc bug see Bug 49171: [C++0x][constexpr] Constant expressions support reinterpret_cast. It is clear from the discussion that gcc devs have some clear use cases for this:

I believe I found a conforming usage of reinterpret_cast in constant expressions useable in C++03:
//---------------- struct X {  X* operator&(); };

X x[2];

const bool p = (reinterpret_cast<X*>(&reinterpret_cast<char&>(x[1]))
- reinterpret_cast<X*>(&reinterpret_cast<char&>(x[0]))) == sizeof(X);

enum E { e = p }; // e should have a value equal to 1
//----------------
Basically this program demonstrates the technique, the C++11 library function addressof is based on and thus excluding reinterpret_cast unconditionally from constant expressions in the core language would render this useful program invalid and would make it impossible to declare addressof as a constexpr function.

but were not able to get an exception carved for these use cases, see closed issues 1384:

Although reinterpret_cast was permitted in address constant expressions in C++03, this restriction has been implemented in some compilers and has not proved to break significant amounts of code. CWG deemed that the complications of dealing with pointers whose tpes changed (pointer arithmetic and dereference could not be permitted on such pointers) outweighed the possible utility of relaxing the current restriction.

BUT apparently gcc and clang support a little documented extension that allows constant folding of non-constant expressions using __builtin_constant_p (exp) and so the following expressions is accepted by both gcc and clang:

static constexpr const void* ptr = 
  __builtin_constant_p( reinterpret_cast<const void*>(0x1) ) ? 
    reinterpret_cast<const void*>(0x1) : reinterpret_cast<const void*>(0x1)  ;

Finding documentation for this is near impossible but this llvm commit is informative with the following snippets provide for some interesting reading:

support the gcc __builtin_constant_p() ? ... : ... folding hack in C++11

and:

// __builtin_constant_p ? : is magical, and is always a potential constant.

and:

// This macro forces its argument to be constant-folded, even if it's not
// otherwise a constant expression.
#define fold(x) (__builtin_constant_p(x) ? (x) : (x))

We can find a more formal explanation of this feature in the gcc-patches email: C constant expressions, VLAs etc. fixes which says:

Furthermore, the rules for __builtin_constant_p calls as conditional expression condition in the implementation are more relaxed than those in the formal model: the selected half of the conditional expression is fully folded without regard to whether it is formally a constant expression, since __builtin_constant_p tests a fully folded argument itself.

the arbitrary list of exceptions to what cannot be `constexpr` is starting to annoy me... — TemplateRex, May 12 '15 at 10:24
@TemplateRex so I was chatting with someone in the last llvm social and IIUC the compiler would have to track a lot of TBAA data in order to detect UB in a constexpr for reinterpret_cast which would be very expensive. I have to follow-up and see if I got this right. — Shafik Yaghmour, Jan 11 '18 at 17:56
I agree that tracking UB is problematic. OTOH, things like `goto` and structured bindings should be able to do for `constepxr`. — TemplateRex, Jan 11 '18 at 20:26
There was a constexpr goto proposal but it was rejected in favor a more comprehensive approach, [See Botond's trip report in the Rejected section](https://botondballo.wordpress.com/2015/06/05/trip-report-c-standards-meeting-in-lenexa-may-2015/). — Shafik Yaghmour, Jan 11 '18 at 20:34
As Kishore mentioned above, gcc does not accept this trick anymore (6.4 still works, 7.1 breaks, 8.0 changes the error message). Clang 9.0 (current latest version) still accepts it. Bummer. — Matthijs Kooijman, Oct 02 '19 at 13:54

Kerrek SB · Answer 2 · 2019-09-08T17:44:01.327

12

Clang is right. The result of a reinterpret-cast is never a constant expression (cf. C++11 5.19/2).

The purpose of constant expressions is that they can be reasoned about as values, and values have to be valid. What you're writing is not provably a valid pointer (since it's not the address of an object, or related to the address of an object by pointer arithmetic), so you're not allowed to use it as a constant expression. If you just want to store the number 1, store it as a uintptr_t and do the reinterpret cast at the use site.

As an aside, to elaborate a bit on the notion of "valid pointers", consider the following constexpr pointers:

constexpr int const a[10] = { 1 };
constexpr int * p1 = a + 5;

constexpr int const b[10] = { 2 };
constexpr int const * p2 = b + 10;

// constexpr int const * p3 = b + 11;    // Error, not a constant expression

static_assert(*p1 == 0, "");             // OK

// static_assert(p1[5] == 0, "");        // Error, not a constant expression

static_assert(p2[-2] == 0, "");          // OK

// static_assert(p2[1] == 0, "");        // Error, "p2[1]" would have UB

static_assert(p2 != nullptr, "");        // OK

// static_assert(p2 + 1 != nullptr, ""); // Error, "p2 + 1" would have UB

Both p1 and p2 are constant expressions. But whether the result of pointer arithmetic is a constant expression depends on whether it is not UB! This kind of reasoning would be essentially impossible if you allowed the values of reinterpret_casts to be constant expressions.

edited Sep 08 '19 at 17:44

answered Jun 24 '14 at 23:56

Kerrek SB

464,522
92
875
1,084

2

How would you know that's not a valid pointer? On mine, it points to address 1, which is perfectly fine. (micro-controller) – Deduplicator Jun 25 '14 at 00:04
@Deduplicator: Unfortunately, C++ disagrees. Valid pointers have to be addresses of objects. The conversion between pointers and integers is generally non-semantic and only useful if the integer was itself obtained by conversion from a valid pointer. – Kerrek SB Jun 25 '14 at 00:06
The mapping is implementation-defined and *should be unsurprising*. That does not stop it being useful. Anyway, please add a quote proving your assertion. – Deduplicator Jun 25 '14 at 00:11
2

@Deduplicator: If you really need a quote, not just a citation, then the referenced clause of the standard says "A *conditional-expression* is a *core constant expression* unless it involves one of the following", where the following list included "a `reinterpret_cast`" – Mike Seymour Jun 25 '14 at 00:19
1

@HolyBlackCat: Indeed, `0` is a null pointer constant. But that's got nothing to do with reinterpreting arbitrary integer values as pointrs. – Mike Seymour Jun 25 '14 at 00:21
Good, so I had the right text. Because this does not stop the given expression from being a constant expression, it just means its not a *core* one. Also, it does not stop it from denoting a valid pointer. – Deduplicator Jun 25 '14 at 00:21
I made that discussion a [question](http://stackoverflow.com/questions/24398426/meaning-of-valid-pointer). – Baum mit Augen Jun 25 '14 at 00:32
2

@Deduplicator: *constant expressions* are a subset of *core constant expressions*, as described in the following paragraph 5.19/3. – Mike Seymour Jun 25 '14 at 00:33
@MikeSeymour Yep, found I had to read it to the bitter end, so Kerrek is right on the constant expression part though perhaps not on the valid pointer / useful bit. – Deduplicator Jun 25 '14 at 00:38
@Deduplicator: I should draw attention again to compile-time propagation of constants (which may not have been entirely clear from my post). If I have a `constexpr const T * p`, then I should be able to discover whether `*p` (or more generally `p[n]`) is a constant expression (and note that it is not if it would have undefined behaviour, also in 5.19). If you jump through reinterpretation hoops, the question becomes unanswerable. – Kerrek SB Jun 25 '14 at 01:05
3

The first paragraph in your answer is undoubtedly valid. The second paragraph, not so much. Deduplicator is absolutely correct. It is a valid pointer on systems where `0x1` is the address of a byte in memory, although it is not a constant expression. – Ben Voigt Jun 25 '14 at 01:18
@BenVoigt: I changed the wording to say that it's not *provably* a valid pointer. – Kerrek SB Jun 25 '14 at 08:37
@BenVoigt: And added a bit of illustration :-) – Kerrek SB Jun 25 '14 at 08:43
This makes me wonder... could a case be made for loosening the rules on constant expressions for `constexpr volatile` entities? I can see a case for allowing, e.g., `constexpr volatile T*` to be initialised with a raw address, where `volatile` is used to indicate that the address is provably valid, but proof requires knowledge unavailable to the implementation; this would allow hardware addresses to be made `constexpr` without loosening the rules for non-`constexpr volatile` entities, somewhat similarly to how `volatile` normally works. – Justin Time - Reinstate Monica Sep 08 '19 at 19:25

Matthijs Kooijman · Answer 3 · 2022-10-27T18:49:07.053

I have also been running into this problem when programming for AVR microcontrollers. Avr-libc has header files (included through <avr/io.h> that make available the register layout for each microcontroller by defining macros such as:

#define TCNT1 (*(volatile uint16_t *)(0x84))

This allows using TCNT1 as if it were a normal variable and any reads and writes are directed to memory address 0x84 automatically. However, it also includes an (implicit) reinterpret_cast, which prevents using the address of this "variable" in a constant expression. And since this macro is defined by avr-libc, changing it to remove the cast is not really an option (and redefining such macros yourself works, but then requires defining them for all the different AVR chips, duplicating the info from avr-libc).

Since the folding hack suggested by Shafik here seems to no longer work in gcc 7 and above, I have been looking for another solution.

Looking at the avr-libc header files more closely, it turns out they have two modes:

Normally, they define variable-like macros as shown above.
When used inside the assembler (or when included with _SFR_ASM_COMPAT defined), they define macros that just contain the address, e.g.: #define TCNT1 (0x84)

At first glance the latter seems useful, since you could then set _SFR_ASM_COMPAT before include <avr/io.h> and simply use intptr_t constants and use the address directly, rather than through a pointer. However, since you can include the avr-libc header only once (iow, only have TCNT1 as either a variable-like-macro, or an address), this trick only works inside a source file that does not include any other files that would need the variable-like-macros. In practice, this seems unlikely (though maybe you could have constexpr (class?) variables that are declared in a .h file and assigned a value in a .cpp file that includes nothing else?).

In any case, I found another trick by Krister Walfridsson, that defines these registers as external variables in a C++ header file and then defines them and locates them at a fixed location by using an assembler .S file. Then you can simply take the address of these global symbols, which is valid in a constexpr expressions. To make this work, this global symbol must have a different name as the original register macro, to prevent a conflict between both.

E.g. in your C++ code, you would have:

extern volatile uint16_t TCNT1_SYMBOL;

struct foo {
  static constexpr volatile uint16_t* ptr = &TCNT1_SYMBOL;
};

And then you include a .S file in your project that contains:

#include <avr/io.h>
.global TCNT1_SYMBOL
TCNT1_SYMBOL = TCNT1

While writing this, I realized the above is not limited to the AVR-libc case, but can also be applied to the more generic question asked here. In that case, you could get a C++ file that looks like:

extern char MY_PTR_SYMBOL;
struct foo {
  static constexpr const void* ptr = &MY_PTR_SYMBOL;
};

auto main() -> int {
  return 0;
}

And a .S file that looks like:

.global MY_PTR_SYMBOL
MY_PTR_SYMBOL = 0x1

Here's how this looks: https://godbolt.org/z/vAfaS6 (I could not figure out how to get the compiler explorer to link both the cpp and .S file together, though

This approach has quite a bit more boilerplate, but does seem to work reliably across gcc and clang versions. Note that this approach looks like a similar approach using linker commandline options or linker scripts to place symbols at a certain memory address, but that approach is highly non-portable and tricky to integrate in a build process, while the approach suggested above is more portable and just a matter of adding a .S file into the build.

As pointed out in the comments, there is a performance downside though: The address is no more known at compile time. This means the compiler can no more use IN, OUT, SBI, CBI, SBIC, SBIS instructions. This increases code size, makes code slower, increases register pressure and many sequences are no more atomic, hence will need extra code if atomic execution is needed (most of the cases).

That solution with a symbol might work, however the address is no more known at compile time. This means the compiler can no more use `IN`, `OUT`, `SBI`, `CBI`, `SBIC`, `SBIS` instructions. This increases code size, makes code slower, increases register pressure and many sequences are no more atomic, hence will need extra code if atomic execution is needed (most of the cases). — emacs drives me nuts, Oct 24 '22 at 10:33
Well, strictly speaking, such SFRs could be defined using attribute `io` resp. `io_low` so the compiler could generate respective instructions + relocs, and resolution of the symbol / reloc is postponed to link time. However, you'll have to know when to add which attribute to begin with... (I never really got the purpose of these attributes and never used them). — emacs drives me nuts, Oct 27 '22 at 19:13

score 0 · Answer 4 · answered Jun 17 '21 at 09:18

This is not a universal answer, but it works with that special case of a struct with special function registers of an MCU peripheral at fixed address. A union could be used to convert integer to pointer. It is still undefined behavior, but this cast-by-union is widely used in an embedded area. And it works perfectly in GCC (tested up to 9.3.1).

struct PeripheralRegs
{
    volatile uint32_t REG_A;
    volatile uint32_t REG_B;
};

template<class Base, uintptr_t Addr>
struct SFR
{
    union
    {
        uintptr_t addr;
        Base* regs;
    };
    constexpr SFR() :
        addr(Addr) {}
    Base* operator->() const
    {
        return regs;
    }
    void wait_for_something() const
    {
        while (!regs->REG_B);
    }
};

constexpr SFR<PeripheralRegs, 0x10000000> peripheral;

uint32_t fn()
{
    peripheral.wait_for_something();
    return peripheral->REG_A;
}

constexpr and initialization of a static const void pointer with reinterpret cast, which compiler is right?

4 Answers4

Linked

Related