I recently discovered strict aliasing rule in C/C++ and now I am try to remedy at some part of my code where it was present.
I am given a buffer
of uint8_t
in which I have to deserialize some data, among which, some struct like
struct header {
uint16_t type;
uint16_t magic;
uint16_t options;
uint16_t flags;
uint64_t size;
};
I know that such a struct is located as some position stride
in the buffer
(forget for the moment about alignment, to simplify the setup). Before knowing the strict aliasing rule I dereferenced the pointer to buffer
after a reinterpret_cast
which is bad. For instance I had several functions like
bool test_AAAA_1(uint8_t* buffer, uint64_t stride)
{
return reinterpret_cast<header*>(buffer + stride)->magic == 0xAAAA;
}
Now, I tried to translate these function using, hopefully wisely, std::memcpy
. So I wrote the helper function
template <typename T>
inline T read_buffer(uint8_t* buffer)
{
T tmp;
std::memcpy(&tmp, buffer, sizeof(T));
return tmp;
}
and I used it to write
bool test_AAAA_2(uint8_t* buffer, uint64_t stride)
{
return read_buffer<header>(buffer + stride).magic == 0xAAAA;
}
Now, with godbolt you can see the result of three compilers which I need the code to be compiled by: gcc, clang, msvs, with optimization. While on gcc the test_AAAA_1
and test_AAAA_2
, the assembly reads the same and there are minor difference with clang, with msvs the situation seems to be pretty different. A paramount requirement of my code should be the speed.
Can you suggest me some improvement to the non-strict-aliasing-braking code in order to reach the same performance of the other?
NOTE ADDED
I'm feeling very frustrated in doing this since is like I am back-engineering something that works but is not standard compliant to something that has a behavior which is defined but might be unwanted, depending on the optimization of the compiler. It seems to me to have to choose between to hope the compiler will ignore my non-compliance with standard, and to hope that the compiler will optimize what I write as I have in mind. And perhaps hope is a very fragile paradigm.
EDIT
In order to see where the difference was I also added
template <typename T>
inline T illegal(uint8_t* buffer)
{
return *reinterpret_cast<T*>(buffer);
}
bool test_AAAA_3(uint8_t* buffer, uint64_t stride)
{
return illegal<header>(buffer + stride).magic == 0xAAAA;
}
and the assembly of test_AAAA_2
and test_AAAA_3
coincides.
This means that probably my choice of helper function was not optimal. But nothing else came to my mind.
EDIT 2
Changing the header
's filed uint64_t size
to uint64_t size[20]
you can see the instructions of MSVS are increasing dramatically. Perhaps it is not able to optimize the copy at the end of the helpers function.