0

Reading Rapidjson code I found some interesting optimization with "type punning".

    // By using proper binary layout, retrieval of different integer types do not need conversions.
    union Number {
#if RAPIDJSON_ENDIAN == RAPIDJSON_LITTLEENDIAN
        struct I {
            int i;
            char padding[4];
        }i;
        struct U {
            unsigned u;
            char padding2[4];
        }u;
#else
        struct I {
            char padding[4];
            int i;
        }i;
        struct U {
            char padding2[4];
            unsigned u;
        }u;
#endif
        int64_t i64;
        uint64_t u64;
        double d;
    };  // 8 bytes

It looks like only BE CPUs are affected by this optimization. How does this increases performance? I'd like to test but do not have BE machine.

Wikipedia says:

While not allowed by C++, such type punning code is allowed as "implementation-defined" by the C11 standard[15] and commonly used[16] in code interacting with hardware.[17]

So is it legal in C++? I believe in absolute most cases it works fine. But should I use it in new code?

kyb
  • 7,233
  • 5
  • 52
  • 105
  • 3
    It is not legal according to the C++ Standard. C11 is for the C language, not C++. – aschepler Nov 15 '20 at 20:18
  • I see. But this library is presented as the fastest FOSS serializer/parser on all the internet. So It works fine. And has 9K9 stars on GitHub. – kyb Nov 15 '20 at 20:24
  • If “Not allowed by C++” means that the code is ill-formed, the requirement is only that a conforming compiler must issue a diagnostic. Having done that, the compiler is free to compile the code, with an implementation-specific meaning. So, if it works, it works. If it doesn’t work, don’t blame the C++ standard. – Pete Becker Nov 15 '20 at 20:30

1 Answers1

1

So is it legal in C++?

No, it isn't legal in c++ (Wikipedia also already stated "While not allowed by C++ ...").

In c++ a union is just reserving memory for the contained union members, such that it is enough to fit the largest member. That memory is shared by all members.

Accessing a different member from the union as was used to initialize it, is undefined behavior. You need to decide beforehand with which union members to work, if these are shared by any functions (this is often done using a type discriminator).

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190