0

Let's say I have data, such as below:

union
{
  struct
  {
    char flags : 4;
    uint16_t   : 12;
  }
  char data[2];
}

I understand how to make this code run regardless of byte endianness on a platform. I am asking to be sure that my understanding of how it would be stored on the different endians is correct.

As I understand it: If I were to store a uint16 in the 12 bit uint, both endians would drop the 4 highest bits. Big-endian would store the remaining 4 high bits in the same byte as the flags, and the rest in a separate byte. Little-endian would store the 4 lowest bits in the same byte as the flags, and the rest in a separate byte.

Is this right?

timrau
  • 22,578
  • 4
  • 51
  • 64
Andrew Williamson
  • 8,299
  • 3
  • 34
  • 62
  • the `C++` specs are trying to define and provide a _logically correct_ language, not a language that is correct to the _bit_ ; if this makes some sort of sense to you. In other words, `C++` doesn't care nor specifies the representation of the information used at the bit level, this is a similar issue to the reason why you can't infer a `float` as a template argument, `C++` doesn't provide a definition for the representation of `float` so it can't even offer some basic guarantees that are essential for templates. Everything about this is compiler and ABI specific. – user2485710 Mar 05 '14 at 07:00

2 Answers2

1

It depends on the compiler and the target platform's ABI. See, e.g. the rules for GCC bit-fields: The order of allocation of bit-fields within a unit is Determined by ABI. Also each field is supposed to be declared int or unsigned int, not uint16_t.

If you want to control the data format, you can use shifting and masking to assemble the data into a uint16_t. If your aim is to write the data out in a well-defined format, you can write the uint16_t bytes in the required endianness or just assemble the data into 2 bytes and write them in the required order.

Unless you find language spec docs that promise what you want, or your compiler docs make clear promises and you're using the same compiler for big and little endian CPUs, don't rely on C/C++ compilers to all do something like this the same way.

Jerry101
  • 12,157
  • 5
  • 44
  • 63
  • So it is not only determined by the platform, but the compiler you use as well? Thinking about it now, I remember `uint16_t` is not defined as a base type, it's declared in a separate header. It would make sense that a bit-field should only support the base types. My original hope was that I could include an 'endianness' field in each message, and if the host had the same endianness as the messages' sender, then I could skip the bit shifting and load the raw data into a union like above. I see this will not work. – Andrew Williamson Mar 05 '14 at 18:43
  • The padding between fields can also vary by compiler and target platform. The usual approach is to always send data in "network byte order" (which is big endian). See XDR for an example standard. There are libraries to help do this. – Jerry101 Mar 05 '14 at 19:37
  • Thank you for all your help. I understand how to make this endian agnostic, I'm just looking for __tiny__ optimizations - out of personal interest. There is no way I will write code dependant on something so finicky as this. Apparently padding is almost always non-existent for byte size data. Upvoting your answer because you have been incredibly helpful. – Andrew Williamson Mar 05 '14 at 20:17
1

Little-endian would store the 4 lowest bits in the same byte as the flags

All the compilers I'm familiar with only combine adjacent bitfields if their base storage unit is the same kind (one of those lovely implementation defined details though). So in your example, mixing char with uint16_t would break combining them, meaning that struct would use 3 bytes (for either endianness). Using the same basetype for both fields gets you want you want (but static_assert(sizeof(...) == 2) just in case):

union
{
  struct
  {
    uint16_t flags : 4;
    uint16_t value : 12;
  }
  uint8_t data[2];
}

The bit layout for each would be:

Absolute Bit Byte Bit in byte LE order BE order#1 BE order#2
0 0 0 F0 V8 V4
1 0 1 F1 V9 V5
2 0 2 F2 VA V6
3 0 3 F3 VB V7
4 0 4 V0 F0 V8
5 0 5 V1 F1 V9
6 0 6 V2 F2 VA
7 0 7 V3 F3 VB
8 1 0 V4 V0 F0
9 1 1 V5 V1 F1
10 1 2 V6 V2 F2
11 1 3 V7 V3 F3
12 1 4 V8 V4 V0
13 1 5 V9 V5 V1
14 1 6 VA V6 V2
15 1 7 VB V7 V3

Note for big endian machines, I've encountered 2 possible orderings of bitfields:

  1. allocate the bitfields in decreasing order from the back of the word to the front. You can find this ordering in gcc on Linux with big endian machines.
  2. allocate the bitfields in increasing order like LE, but swap whole bytes within the word. You can find this in the C51 8051 microcontroller compiler.
Dwayne Robinson
  • 2,034
  • 1
  • 24
  • 39
  • 1
    Thanks for the insight. It's been a while, but I think I was expecting `[F0, F1, F2, F3, V8, V9, VA, VB], [V0, V1, V2, V3, V4, V5, V6, V7]`. The two possibilities you've listed make more sense, because the whole value can be loaded to a register then shifted or masked. – Andrew Williamson Nov 03 '22 at 06:30