Testing for Endianness: Why does the following code work?

Question

While I do understand endianness, I am slightly unclear on how the code works below. I guess this question is less about endianness and more about how the char * pointer and int work i.e. type conversion. Also, would it have made any difference if the variable word was not a short but just an int? Thanks!

#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1

int byteOrder() {
    short int word = 0x0001;
    char * byte = (char *) &word;
    return (byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}

Testing for endianness at run-time is pointless - you already know the endianness at compile-time. — Paul R, Jul 03 '11 at 18:11
The ternary operator is pointless, just `return byte[0];` is sufficient. — Ben Voigt, Jul 03 '11 at 18:13
@Ben: The above is cleaner; it's more explicit, and gracefully tracks changes in the values of the `#define`s. — Oliver Charlesworth, Jul 03 '11 at 18:14
@jdv: of course you can - you can either use compiler pre-defined macros (e.g. `_LITTLE_ENDIAN_`) or since you, the programmer, already know the target endianness you can define your own macros. There is no "portability" issue. — Paul R, Jul 03 '11 at 18:24
@Paul Don't many build systems actually compile a mini-program like this simply to set those macros? I wouldn't call the program pointless. See, for example, [a method of doing this in autoconf](http://www.google.com/codesearch#RGCD84x9Jg0/trunk/lib/libiconv/m4/endian.m4&q=autoconf%20endian&ct=rc&cd=5&sq=). — mbauman, Jul 03 '11 at 18:39
@Paul thanks paul. Might I ask how one might determine endianness at compile time? — OckhamsRazor, Jul 03 '11 at 18:49
@PaulR: Practically yes, theoretically... http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cjacabbf.html — kennytm, Jul 03 '11 at 19:20
Rather than using pointers to type-pun, I would use a union. Naively, I think it's more likely that the compiler will optimize it out at compile-time. — R.. GitHub STOP HELPING ICE, Jul 03 '11 at 19:21
@Ockham: compilers such as gcc pre-define macros e.g. `__LITTLE_ENDIAN__` which you can use with #ifdef for code that needs to be endianness-aware. If you are using a compiler which does not pre-define such macros then you can just define your own equivalent macro in your makefile or build system. — Paul R, Jul 03 '11 at 20:09
@Paul And how do you define such a macro at compile time for code that'll have to run on both endianess architectures? Sure you could add a compile switch, but since for many programs that have to deal with such low level code you'll need several checks anyhow, you can just throw that one check in there as well and be done. Certainly nicer for the users than two different makefiles. — Voo, Jul 03 '11 at 23:40
@Voo: since you typically need to take care of a number of different architecture-dependent factors in any reasonably complex build system it doesn't seem like a huge problem to have to define `__LITTLE_ENDIAN__` or `__BIG_ENDIAN__` along with any other platform-specific compiler flags and other options. Your alternative of checking at run-time means that you have to compile both big and little endian versions of code into every executable, which seems pointless, wasteful and unnecessary to me - it also has implications for any performance-critical code which is dependent on endianness. — Paul R, Jul 04 '11 at 07:02

mbauman · Accepted Answer · 2011-07-03T18:21:27.950

18

A short int is made up of two bytes, in this case 0x00 and 0x01. On a little endian system, the small byte comes first, so in memory it appears as 0x01 followed by 0x00. Big endian systems are, naturally, reversed. This is what the pointers look like for short integers on a little endian system:

----------------------- ----------------------- 
|   0x01   |   0x00   | |          |          | 
----------------------- ----------------------- 
   &word                  &word+1

Char pointers, on the other hand, are always incremented sequentially. Thus, by taking the address of the first byte of the integer and casting it to a char * pointer, you may increment through each byte of the integer in memory-order. Here's the corresponding diagram:

------------ ------------ ------------ ------------ 
|   0x01   | |   0x00   | |          | |          | 
------------ ------------ ------------ ------------ 
   &byte       &byte+1      &byte+2      &byte+3

edited Jul 03 '11 at 18:21

answered Jul 03 '11 at 18:08

mbauman

30,958
4
88
123

5

This is an exception to the type-punning rule. It is valid to access any data-type via a `char *` pointer. – Oliver Charlesworth Jul 03 '11 at 18:21
Ah! Of course. Thanks for the correction @Oli. I removed the section. – mbauman Jul 03 '11 at 18:22
@Oli Do you have a reference? – Maxpm Jul 04 '11 at 04:54
@Maxpm: Start with this: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html; I'll come up with the precise C standard quote in a minute... – Oliver Charlesworth Jul 04 '11 at 09:16
1

@Maxpm: C99, 6.5/7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type." – Oliver Charlesworth Jul 04 '11 at 09:18

Oliver Charlesworth · Answer 2 · 2011-07-03T18:13:51.397

7

(char *)&word points to the first (lowest address) char (byte) of word. If your system is little-endian, this will correspond to 0x01; if it is big-endian, this will correspond to 0x00.

And yes, this test should work whether word is short, int or long (so long as they're bigger in size than a char).

edited Jul 03 '11 at 18:13

answered Jul 03 '11 at 18:08

Oliver Charlesworth

267,707
33
569
680

score 4 · Answer 3 · answered Jul 03 '11 at 18:07

That is a cute little program. You have a word being set to a hex literal 1. If you have little endian, the least significant byte(0x01 in this case) would be at byte[0] when you cast the pointer to a char pointer. and so if 0x01 is at offset 0, then you know it was little endian, otherwise if 0x00 is at offset 0, you know the least significatn byte was stored in the higher memory location(offset 1).

Note: pointers always point to the lowest memory address of the word/data structure etc...

score 2 · Answer 4 · answered Jul 03 '11 at 18:35

It tells you the endianness of a short. At least on some machines, where short is exactly two bytes. It doesn't necessarily tell you the endianness of an int or a long, and of course, when the integral type is larger than two bytes, the choice isn't binary.

The real question is why you would want to know. It's almost always simpler and more robust to write the code so that it doesn't matter. (There are exceptions, but they almost always involve very low level code which will only work on one specific hardware anyway. And if you know the hardware well enough to be writing that sort of code, you know the endianness.)

score 1 · Answer 5 · answered May 29 '13 at 14:36

The trick I use to remember the byte order when thinking about big-endian vs little-endian is "the names should be the other way around":

When you're writing a number by hand, the natural way to do it is to write left-to-right, starting with the most significant digits and ending with the least significant digits. In your example, you'd first write the most significant byte (i.e. 0) then the least significant byte (i.e. 1). This is how big-endian works. When it writes data to memory (with increasing byte address) it ends with the least-significant bytes - the 'little' bytes. So, big-endian actually ends with little bytes.
Same for little-endian: it actually ends with the most-significant byte, i.e. the 'big' bytes.

Your source code checks if the 1st byte (i.e. byte[0]) is the most-significant byte (0), in which case it's a 'big-startian', or little endian byte ordering.

Testing for Endianness: Why does the following code work?

5 Answers5

Linked

Related