3

I Need to switch the order of bytes so that an int16 with contents (byte1, byte2) -> (byte2, byte1). I did this using a union:

union ConversionUnion
{
    uint8_t m_8[4];
    uint16_t m_16[2];
    uint32_t m_32;
};

//use
uint16_t example = 0xFFDE
ConversionUnion converter;
converter.m_16[0] = example;
std::swap(converter.m_8[0], converter.m_8[1]);
example = converter.m_16[0]; //0xDEFF

Now this does work on gcc, but i have been informed that this is undefined behavior (gcc 6.3, C++11).

Questions:

1) Is this really undefined behavior, I ask because i've seen this before in embedded code. Other stackoverflow questions seem to debate this, who's actually correct (for C++11 & C++14).

2) If this is undefined behavior, can byte order swapping be done without a bunch of bit shifting in a portable way. I really hate bit shifting, its horribly ugly.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
Stephen Eckels
  • 435
  • 6
  • 17
  • 2
    Unfortunately, this topic pops up on Stack Overflow and there is not really a satisfactory answer to it. Strictly, by reading the C++ standard, you would come to the conclusion that it is undefined behavior and therefore should be avoided. By studying existing implementations and code bases, and by noticing that it is not undefined behavior in C, you would come to the conclusion that it is very well-supported and portable construct for byte swapping. Both viewpoints are valid and there is not some kind of smoking gun that would make it easy to point at one viewpoint and say it is "right". – Dietrich Epp Jun 02 '17 at 00:25
  • On a more subjective note, take note that the idea that you should avoid bit shifting "because it is ugly" is leading you on this trek to research all these alternatives on Stack Overflow. If you just shove the bit shifting in a function, you could get on writing code. There are a lot of parts of C++ that are "ugly" and if you try to avoid them all you'll never get any work done. – Dietrich Epp Jun 02 '17 at 00:27
  • I write the ugly code on the job, and then i research on stackoverflow on how to make it more beautiful :). So was this previously defined in C and then C++ changes the behavior? Do you know of ANY way other than bitshifting to do this. This has real implications too, bit shifts are not idiomatic and i have to do variations of this for lots of types in ALOT of places (i do embedded work) – Stephen Eckels Jun 02 '17 at 00:32
  • In general, I would suggest thinking of C and C++ as two different languages which happen to have some common features and syntax, rather than thinking of C as an earlier version of C++. However, you only need to write the bit shifting out maybe three times... surely not for lots and lots of types! You usually only need a 16-bit, 32-bit, and 64-bit version. I don't know what you mean by "not idiomatic"... it sounds like you are putting your foot down and saying that the code has to be beautiful, and whether or not it functions as intended is secondary. – Dietrich Epp Jun 02 '17 at 00:37
  • More information about the subject w.r.t. the C standard is available in [Defect Report #283](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm) – Dietrich Epp Jun 02 '17 at 00:38
  • "Undefined behavior" means that the language definition does not tell you what the code does. It does not mean that bad things must happen; it means that if you use such code you should have some source of knowledge other than the language definition that tells you what the code does. – Pete Becker Jun 02 '17 at 00:45
  • Pete although i agree undefined behavior != explosions i write production embedded code so anything other than a known state is out of the question. – Stephen Eckels Jun 02 '17 at 00:52

3 Answers3

4

Type punning is allowed via char*, so why not just use that rather than a union?

uint16_t example = 0xFFDE;
char *char_alias = reinterpret_cast<char*>(&example);
std::swap(char_alias[0], char_alias[1]);
David Scarlett
  • 3,171
  • 2
  • 12
  • 28
  • I like this technique. Is it safe to switch the char* to uint8_t* to make it more obvious its switching bytes? I'd do this uint8_t* punning = (uint8_t*)&example; – Stephen Eckels Jun 02 '17 at 00:45
  • 1
    Unfortunately, using `uint8_t*` instead may be undefined behaviour. The standard only guarantees that type punning via character types (signed or unsigned) is safe, and there's no guarantee that the compiler will treat `uint8_t` as a character type. E.g. See this GCC bug and resulting discussion that suggests one can't reliably assume that uint8_t will be considered a character type: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110 – David Scarlett Jun 02 '17 at 00:52
  • that is unfortunate. I will use std::byte when it's out then, until then it shall live as a TO-DO comment with a sad face next to it. – Stephen Eckels Jun 02 '17 at 00:57
2

Relying on undefined behavior or obscure semantics of the language (union) is not necessarily more idiomatic or easier to read. I find this loop much easier to parse:

uint32_t example = 0xc001c0de;
unsigned char *p = reinterpret_cast<unsigned char*>(&example);

for (size_t low = 0, high = sizeof(example) - 1;
     high > low;
     ++low, --high)
{
    std::swap(p[low], p[high]);
}
1
  1. People disagree to some extent about this one. I think that

    it's undefined behavior

    but having my opinion is not a valuable addition.

  2. Byte swapping is easy with unsigned types (BTW byte-swapping signed types doesn't make sense). Just extract individual bytes and rearrange. Hide the ugliness in a constexpr function or a macro.

    constexpr uint16_t bswap(uint16_t value);
    {
        uint16_t high_byte = (value >> 8) & 0xff;
        uint16_t low_byte = value & 0xff;
        return (low_byte << 8) | high_byte;
    }
    

BTW if you see something in embedded code, and it works, this isn't an indication that it's safe! Embedded code often sacrifices portability for efficiency, sometimes using undefined behavior where it was the only way to convince a particular compiler to generate efficient code.

anatolyg
  • 26,506
  • 9
  • 60
  • 134