2

I'm confused about whether or not this is a strict aliasing violation or invokes undefined behavior. Multiple people have told me it's not a violation and that there is no UB, but from reading the C++ spec it sounds like doing anything that dereferences a type-casted pointer (unless it was just cv-casted or casted to a compatible type or char) is UB.

#define SWAP(x) (((x) << 8) | ((x) >> 8)))

char* foo(char* data, size_t len16) {
  int align = reinterpret_cast<uintptr_t>(data) % sizeof(uint16_t);
  if (align == 0) {
    uint16_t* data16 = reinterpret_cast<uint16_t*>(data);
    for (size_t i = 0; i < len16; i++) {
      data16[i] = SWAP(data16[i]);
    }
  } else {
    throw "Unaligned";
  }

  return data;
}

https://godbolt.org/g/DIXtJX

(This is a slightly contrived example; in reality SWAP may be a third-party function that requires a uint16_t.)

Is the story changed because we've checked alignment, are confident about the size of the types and don't care about endianness? The remaining concern would, I guess, be dead code elimination by the optimizer.

If this is illegal, how would you efficiently interpret a char buffer (e.g. from a file) as its intended type (e.g. int16s)? I'm familiar with casting through a union, but I don't really see how that's any different (see casting through a union(1), aside from telling the compiler that it cannot do dead code elimination.

ZachB
  • 13,051
  • 4
  • 61
  • 89

2 Answers2

4

Multiple people have told me it's not a violation and that there is no UB

Those people are wrong. This code:

uint16_t* data16 = reinterpret_cast<uint16_t*>(data);

is defined behavior if, and only if, there is an object of type uint16_t at data. That is, if I called your foo like:

uint16_t p = 42;
foo(reinterpret_cast<char*>(&p), 2); // now, we're ok

or:

char data[64];
new (data) uint16_t{42};
foo(data, 2); // also ok, though with C++17 you'll have to use std::launder

But otherwise, it's UB. The non-UB way to do this would be:

uint16_t data16;
memcpy(&data16, data, sizeof(data16));

which many compilers will treat as a reinterpret_cast.


That said, an enormous amount of networking code does what you are doing, and there would be riots in the streets many angry messages posted on forums if compilers optimized this code in a way that caused it to not do what you want it to.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • Thanks for the helpful reply. Re: the non-UB way, it looks like g++ indeed treats that like a `reinterpret_cast` for -Os, -O1 and -O2, but that method seems to prevent loop unrolling with -O3. https://godbolt.org/g/KHwOXi (Not tested if that's an actual deopt though.) – ZachB Sep 22 '16 at 17:54
2

doing anything that dereferences a type-casted pointer (unless it was just cv-casted or casted to a compatible type or char) is UB.

That is not correct.

Any pointer can be cast to char* and void* and back. It is UB only when the original pointer is of a different type than the pointer type used when dereferencing the pointer. There are exceptions to even that. See Struct alignment and type reinterpretation for an example.

Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Right, okay, but if the original pointer is a `char*` then it cannot be `reinterpret_cast`ed to e.g. a `uint16_t`? Or can we say the "original pointer" was a `uint16_t` because e.g. another executable wrote `uint16_t`s? – ZachB Sep 22 '16 at 17:01
  • @ZachB, are you talking about data received over a network or read from a file? – R Sahu Sep 22 '16 at 17:03
  • Either, but we know the endianness, if that's what you're getting at. – ZachB Sep 22 '16 at 17:05
  • @ZachB, that's correct. If data is received or read with the same endianness, than you should be good. – R Sahu Sep 22 '16 at 17:06