1

I recently discovered strict aliasing rule in C/C++ and now I am try to remedy at some part of my code where it was present.

I am given a buffer of uint8_t in which I have to deserialize some data, among which, some struct like

struct header {
    uint16_t type;
    uint16_t magic;
    uint16_t options;
    uint16_t flags;
    uint64_t size;
};

I know that such a struct is located as some position stride in the buffer (forget for the moment about alignment, to simplify the setup). Before knowing the strict aliasing rule I dereferenced the pointer to buffer after a reinterpret_cast which is bad. For instance I had several functions like

bool test_AAAA_1(uint8_t* buffer, uint64_t stride)
{
    return reinterpret_cast<header*>(buffer + stride)->magic == 0xAAAA;
}

Now, I tried to translate these function using, hopefully wisely, std::memcpy. So I wrote the helper function

template <typename T>
inline T read_buffer(uint8_t* buffer)
{
    T tmp;
    std::memcpy(&tmp, buffer, sizeof(T));
    return tmp;
}

and I used it to write

bool test_AAAA_2(uint8_t* buffer, uint64_t stride)
{
    return read_buffer<header>(buffer + stride).magic == 0xAAAA;
}

Now, with godbolt you can see the result of three compilers which I need the code to be compiled by: gcc, clang, msvs, with optimization. While on gcc the test_AAAA_1 and test_AAAA_2, the assembly reads the same and there are minor difference with clang, with msvs the situation seems to be pretty different. A paramount requirement of my code should be the speed.

Can you suggest me some improvement to the non-strict-aliasing-braking code in order to reach the same performance of the other?

NOTE ADDED

I'm feeling very frustrated in doing this since is like I am back-engineering something that works but is not standard compliant to something that has a behavior which is defined but might be unwanted, depending on the optimization of the compiler. It seems to me to have to choose between to hope the compiler will ignore my non-compliance with standard, and to hope that the compiler will optimize what I write as I have in mind. And perhaps hope is a very fragile paradigm.

EDIT

In order to see where the difference was I also added

template <typename T>
inline T illegal(uint8_t* buffer)
{
    return *reinterpret_cast<T*>(buffer);
}

bool test_AAAA_3(uint8_t* buffer, uint64_t stride)
{
    return illegal<header>(buffer + stride).magic == 0xAAAA;
}

and the assembly of test_AAAA_2 and test_AAAA_3 coincides. This means that probably my choice of helper function was not optimal. But nothing else came to my mind.

EDIT 2

Changing the header's filed uint64_t size to uint64_t size[20] you can see the instructions of MSVS are increasing dramatically. Perhaps it is not able to optimize the copy at the end of the helpers function.

MaPo
  • 613
  • 4
  • 9
  • 1
    Maybe `memcpy` a single member only? Use `offsetof()` to find its offset in the buffer. – HolyBlackCat Aug 04 '22 at 13:01
  • 2
    If you can use C++20 them `memcpy` can be avoided and you can use `std::bit_cast` instead. If not then you need the `memcpy` solution if you want to have standard conforming code. – NathanOliver Aug 04 '22 at 13:05
  • @HolyBlackCat, well perhaps this would be an inconvenient since I would have write a helper function for every member of every struct. – MaPo Aug 04 '22 at 13:06
  • 1
    I am not quite sure what is wrong with MSCV? I see just one extra `shr` instruction. Are you worried about its performance? – Quimby Aug 04 '22 at 13:08
  • 1
    *"(forget for the moment about alignment, to simplify the setup)."* Notice there are also endianness to handle. – Jarod42 Aug 04 '22 at 13:08
  • @NathanOliver Unfortunately the machine I am constrained to C++17 (as far as I know in C++20 the reinterpret cast would be somewhat legal). And is not `std::bit_cast` built upon `std::memcpy`? – MaPo Aug 04 '22 at 13:08
  • @Quimby I think that you are looking at the clang one. – MaPo Aug 04 '22 at 13:09
  • 1
    `std::bit_cast` can be implemented via `memcpy`, but it also allows compiler vendors to use an [intrinsic function](https://stackoverflow.com/questions/2268562/what-are-intrinsics) instead – NathanOliver Aug 04 '22 at 13:10
  • 1
    @MaPo I am looking at the right-most one and I think godbolt is reproducible. Can you tell me what bothers you in the output you see? – Quimby Aug 04 '22 at 13:10
  • 1
    Another option is to simply ignore this UB. [Rationale](https://stackoverflow.com/a/72706785/2752075). – HolyBlackCat Aug 04 '22 at 13:13
  • @Quimby It seems to me to have more instruction, even though I did not benchmark it. – MaPo Aug 04 '22 at 13:13
  • 1
    @MaPo I see just one, well two actually with separated `mov` and `cmp`. Are you worried about the `read_buffer` emitted instructions? Because that is irrelevant, nothing calls it, the call has been inlined, MSCV just did not remove this unused function for some reason, not familiar with MSCV flags, maybe there is an option for doing that. – Quimby Aug 04 '22 at 13:15
  • @HolyBlackCat, I knew that answer and I agree with it. I think I'm not the only one. But it seems to me so strange that there is no way to have the same behavior legally. – MaPo Aug 04 '22 at 13:16
  • @Quimby to my purpose it may be problematic to be less performant. Then also It seems to me something morally unacceptable to give up performance because of the standard. – MaPo Aug 04 '22 at 13:19
  • 1
    @MaPo I am not disputing any of that, all I am trying to say is that I cannot reproduce "msvs the situation seems to be pretty different" . The code for `test_AAAA_X` is very nearly identical 4 vs 6 instructions. Are you going to check it for every build? Because I would not bet on it staying the same all the time. – Quimby Aug 04 '22 at 13:24
  • @Quimby I admit that my knowledge of assembly is extremely limited. However 6 is 50% more than 4. As far as "every bulid" sentence I do not think I get the point. (I suppose, due to my limited knowledge, that there are not non-deterministic effects...) – MaPo Aug 04 '22 at 13:29
  • 1
    Beware of the compiler flag `/Oi` for MSVC, which is necessary to enable inlining of functions like `memcpy`. If you happen to compile at the wrong optimization level, then the `memcpy` will be implemented as a function call. Also be aware that the compiler may choose to *keep* the `memcpy` (even though inlined) when it can thereby guarantee aligned access afterwards! – Ext3h Aug 04 '22 at 13:29
  • 1
    Beware deserialzing from a buffer without doing length checks on the buffer has enough length for what you're deserializing. That's how you get serious heartbleed. – Mgetz Aug 04 '22 at 13:36
  • I added an EDIT at the end of the section. – MaPo Aug 04 '22 at 13:46
  • @Quimby if you add filed to struct the instructions of MSVS does change. See edit 2 – MaPo Aug 04 '22 at 13:50
  • 1
    "clan" -> "**clang**"? – Spencer Aug 04 '22 at 13:56

1 Answers1

-2

The strict aliasing rule does not apply to character types. uint8_t is an alias for unsigned char so feel free to reinterpret_cast a uint8_t* as some other pointer type. You still will need to be aware of byte ordering though.

Also note that gcc and clang both have a -fno-strict-aliasing compiler option that disables strict aliasing optimizations. As far as I know Microsoft does not apply strict aliasing optimizations.

doron
  • 27,972
  • 12
  • 65
  • 103
  • 1
    Nope. `unsigned char` is only exempt from SA in the other direction: you can access any type through a char pointer, but not the other way around. – HolyBlackCat Aug 04 '22 at 13:35
  • As far as I understad (which may bre wrong) the aliasing property is not symmetrical: one can do `*reinterpret_cast(header_instance)` but not the other way around. Am I wrong? – MaPo Aug 04 '22 at 13:36
  • 1
    I wouldn't add `-fno-strict-aliasing` unless it actually causes problems. I have a feeling that most compilers wouldn't break here. – HolyBlackCat Aug 04 '22 at 13:36
  • @HolyBlackCat I know. But it hurts, a lot! – MaPo Aug 04 '22 at 13:41
  • @HolyBlackCat It's not the compiler which may break on you, but the hardware platform. You might have gotten a little bit too much used to x86 derivatives which all support unaligned memory access for up to 64bit. For 128bit and 256bit types, even x86 derivatives have instructions where the alignment constraints are indirectly enforced by aliasing rules on the language side. – Ext3h Aug 04 '22 at 13:42
  • 2
    @Ext3h: Both clang and gcc interpret the Effective Type rule in such a way that if a region of storage has ever had some particular bit pattern written to it using a certain type, using some other type to store a value that happens to have that same bit pattern may result in nonsensical behavior. Alignment is also an issue, but the broken type-based aliasing logic in clang and gcc can break things even if all hardware alignment requirements are satisfied. – supercat Aug 05 '22 at 22:29