75

Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.

Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.

  • If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?

  • If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?

     // Sender:
     for(n=0, n < sizeof(uint16)*8; ++n) {
         stl_bitset[n] = (value >> n) & 1;
     };
    
     // Receiver:
     for(n=0, n < sizeof(uint16)*8; ++n) {
         value |= uint16(stl_bitset[n] & 1) << n;
     };
    
  • Is there a standard way of compensating for endianness?

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • 10
    +1 Very interesting question! BTW, it would be interpreted as 0x0100 (0d256) because bytes are swapped, not bits :) – BlackBear Aug 24 '11 at 17:50
  • Oh, you're right! I can't believe I got that wrong, busted out the calculator to get that number and everything. Corrected it, thanks! – Anne Quinn Aug 24 '11 at 17:53
  • 2
    I wonder why nobody ever asks about the *bit* endianness - is 1 represented as `00000001` or as `10000000` ;-) – Kerrek SB Aug 24 '11 at 18:02
  • @Kerrek: maybe because it rarely matters these days. – R. Martinho Fernandes Aug 24 '11 at 18:06
  • 2
    Don't forget that "sending information between two computers" not only includes networks but also files written on one computer and somehow transferred to another one. So each and every binary file format must have an exactly specified endianess. – mmmmmmmm Aug 24 '11 at 18:17
  • FYI: You can format code in a list, it just takes a crapton of spaces. – James McNellis Aug 24 '11 at 18:20
  • @Stevens - I didn't know binary files also subscribe to endianness. I mean... thinking about them, they would have to, but the thought never crossed my mind, thanks! – Anne Quinn Aug 24 '11 at 19:52
  • 9
    I have to say that while I know what you're getting at writing `0xHHLL` and the like I don't think it is a good way to represent the concept because `0x...` is a construct at the language syntax level and endianness is at the memory organization level. That is `0xFF12` is `0xFF12` on machines of *every* endianness. The usual construct is to use hex-editor type output or draw memory as a array of boxes with values in them. – dmckee --- ex-moderator kitten Aug 24 '11 at 20:45
  • @KerrekSB Bit-endianness does indeed matter at the hardware level for serial communications, i.e. the order the bits in an octet are put on the wire. Apps programmers are insulated from it. – Russell Borogove Sep 27 '11 at 17:48
  • 1
    htons, htonl, ntohs, ntohl ... Endiness refers to the difference between how different architectures store Integer types. It becomes a major factor when dealing with sockets. Say you want to serialize a struct that contains a few shorts and a few longs. you'd need to use the appropriate function to ensure the data sent over the wire got sent in the proper order ( a network neutral ordering ) to the destination. Also, the client of such data, would have to convert from network to host order. ntohl (net to host long) ect. Pretty self explanatory. – johnathan Dec 04 '11 at 19:25
  • A char array, for instance, dosent suffer from the same endiness problem that integer types do. So a char array sent over the wire "abcdefghijklmnopqrstuvwxyz" would not have to be converted to a network neutral ordering. – johnathan Dec 04 '11 at 19:26

8 Answers8

55

Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.

Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialize data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.

To wit - writing:

uint32_t n = get_number();

unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 };  // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n };  // big-endian order

write(bytes..., 4);

Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.

And reading:

unsigned char buf[4] = read_data();

uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian

Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.


As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.

Tony The Lion
  • 61,704
  • 67
  • 242
  • 415
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • I had always wondered just how far endianness reached, in terms of scope, if it affected the program it was in (and thus, needing to worry about bitwise operations, and directions of bitshifts etc.), or if it was a networking only issue. But when you put it into the context of it only affecting data streams, it makes more sense to me. – Anne Quinn Aug 24 '11 at 19:40
  • @Clairvoire: Endianness does matter within a programming language if you are running the same code on different platforms with different endianness. – phkahler Aug 24 '11 at 21:00
  • 4
    @phkahler: That's a sweeping statement that I would not condone at that generalitly. There are tons of useful programs you can write that run on different platforms and never need to know anything about the binary representation of its types. – Kerrek SB Aug 24 '11 at 21:01
  • 1
    @Kerrek: I should have said "can matter" or else put a condition "when saving or transmitting binary data between platforms of different endianness". Yes, I was overly general. – phkahler Aug 24 '11 at 22:53
37

For the record, if you're transferring data between devices you should pretty much always use network-byte-ordering with ntohl, htonl, ntohs, htons. It'll convert to the network byte order standard for Endianness regardless of what your system and the destination system use. Of course, both systems shoud be programmed like this - but they usually are in networking scenarios.

roottraveller
  • 7,942
  • 7
  • 60
  • 65
John Humphreys
  • 37,047
  • 37
  • 155
  • 255
  • I haven't ever heard of those until your answer, so I'll be sure to look them up, thanks! – Anne Quinn Aug 24 '11 at 19:49
  • 2
    No problem - They're one of those things you wouldn't know to look for until you had to use them. A nice trick is to make a template that converts with htons/htonl based on the length of the input template parameter - it's a pretty effective way to convert simple types to network ordering in one function :) – John Humphreys Aug 24 '11 at 20:36
  • 1
    What the hell is w00te talking about? Beej's Guide to Network Programming will tell you: http://beej.us/guide/bgnet/ – h0b0 Aug 25 '11 at 06:17
7
  1. No, though you do have the right general idea. What you're missing is the fact that even though it's normally a serial connection, a network connection (at least most network connections) still guarantees correct endianness at the octet (byte) level -- i.e., if you send a byte with a value of 0x12 on a little endian machine, it'll still be received as 0x12 on a big endian machine.

    Looking at a short, if you look at the number in hexadecimal,it'l probably help. It starts out as 0x0001. You break it into two bytes: 0x00 0x01. Upon receipt, that'll be read as 0x0100, which turns out to be 256.

  2. Since the network deals with endianess at the octet level, you normally only have to compensate for the order of bytes, not bits within bytes.

  3. Probably the simplest method is to use htons/htonl when sending, and ntohs/ntohl when receiving. When/if that's not sufficient, there are many alternatives such as XDR, ASN.1, CORBA IIOP, Google protocol buffers, etc.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Oh whoops! I wasn't thinking of that when I wrote the example code. I meant to show storing the bits into the bitset, as a form of merely making how they're stored and retrieved the same, since I think bit-shifts ignore endianness AFAIK. I should have made it do so on a byte level to make that more clear (and efficient) though. I'll try google's buffers though, they seem pretty interesting! – Anne Quinn Aug 24 '11 at 19:47
6

The "standard way" of compensating is that the concept of "network byte order" has been defined, almost always (AFAIK) as big endian.

Senders and receivers both know the wire protocol, and if necessary will convert before transmitting and after receiving, to give applications the right data. But this translation happens inside your networking layer, not in your applications.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
6

Both endianesses have an advantage that I know of:

  1. Big-endian is conceptually easier to understand because it's similar to our positional numeral system: most significant to least significant.
  2. Little-endian is convenient when reusing a memory reference for multiple memory sizes. Simply put, if you have a pointer to a little-endian unsigned int* but you know the value stored there is < 256, you can cast your pointer to unsigned char*.
amoss
  • 1,571
  • 1
  • 15
  • 27
5

Endianness is ALWAYS an issue. Some will say that if you know that every host connected to the network runs the same OS, etc, then you will not have problems. This is true until it isn't. You always need to publish a spec that details the EXACT format of on-wire data. It can be any format you want, but every endpoint needs to understand the format and be able to interpret it correctly.

In general, protocols use big-endian for numerical values, but this has limitations if everyone isn't IEEE 754 compatible, etc. If you can take the overhead, then use an XDR (or your favorite solution) and be safe.

No One in Particular
  • 2,846
  • 4
  • 27
  • 32
  • 9
    I object that endianness is *always* an issue. Rather, it is always an issue for *serialized data formats*. The endianness of a particular machine may in good cases be entirely irrelevant. – Kerrek SB Aug 24 '11 at 17:58
4

Here are some guidelines for C/C++ endian-neutral code. Obviously these are written as "rules to avoid"... so if code has these "features" it could be prone to endian-related bugs !! (this is from my article on Endianness published in Dr Dobbs)

  1. Avoid using unions which combine different multi-byte datatypes. (the layout of the unions may have different endian-related orders)

  2. Avoid accessing byte arrays outside of the byte datatype. (the order of the byte array has an endian-related order)

  3. Avoid using bit-fields and byte-masks (since the layout of the storage is dependent upon endianness, the masking of the bytes and selection of the bit fields is endian sensitive)

  4. Avoid casting pointers from multi-byte type to other byte types.
    (when a pointer is cast from one type to another, the endianness of the source (ie. The original target) is lost and subsequent processing may be incorrect)

el rack
  • 41
  • 3
  • Article cited: [Detecting Endian Issues with Static Analysis Tools](http://www.drdobbs.com/windows/detecting-endian-issues-with-static-anal/226000073) – Lucas Sep 30 '12 at 14:55
  • Doing unsigned bit shifts or logical operators in C will not produce different results on different endianness. e.g. a right shift by 1 is always a division by 2 of the value of the variable, regardless of its representation or storage in memory. – bparker Oct 02 '21 at 01:38
3

You shouldn't have to worry, unless you're at the border of the system. Normally, if you're talking in terms of the stl, you already passed that border.

It's the task of the serialization protocol to indicate/determine how a series of bytes can be transformed into the type you're sending, beit a built-in type or a custom type.

If you're talking built-in only, you may suffice with the machine-abstraction provided by tools provided by your environment]

xtofl
  • 40,723
  • 12
  • 105
  • 192