3

Ok, I feel stupid asking this, but why does the code below output different lines?

To print the first line I take an address to the first byte of an array, interpret it as a pointer to uint16_t, take the value and print it's bits one by one.

For the second line I take a pointer to the first byte, interpret it as a pointer to uint8_t, take the value and print it's bits one by one. Then do the same with the second byte.

As I don't modify memory allocated for an array, only interpret it in different ways, I expect output be the same, but the order of bytes is different.

I probably miss something, but the only guess I have is that indirection operator does something I don't expect.

#include <iostream>
#include <string.h>


 int main() {
   uint8_t u[2];
   u[0] = 170;
   u[1] = 85;

  for(int i = 15; i >= 0; --i) {
    printf( "%u", (((*((uint16_t*)u)) >> i) & 0x0001));
  }
  printf( "\n");
  for(int i = 7; i >= 0; --i) {
    printf( "%u", (((*((uint8_t*)u)) >> i) & 0x01));
  }
  for(int i = 7; i >= 0; --i) {
    printf( "%u", (((*((uint8_t*)(u + 1))) >> i) & 0x01));
  }
}

Outout

0101010110101010 
1010101001010101

Update #1: Please ignore the allocation, yes the example code doesn't work on every os, but it is just a simplified example.

Update #2: I knew about the endianness, but what I missed is logical vs physical bit representation. In the example above even though physical representation is unchanged, I print logical representation that is affected by endianness. Big Thanks to @john-kugelman for explaining that.

Shamdor
  • 3,019
  • 5
  • 22
  • 25
  • There is no "indirection operator in C. You are possibly invoking undefined behaviour, as the arrray need not be properly aligned. And you should use the proper format _modifiers_. – too honest for this site Aug 28 '15 at 23:37
  • @Olaf, "The indirection operator(*) accesses a value indirectly through a pointer" https://msdn.microsoft.com/en-us/library/caaw7h5s.aspx – Richard Chambers Aug 28 '15 at 23:39
  • 1
    @RichardChambers: While I certainly do not take somethink like a MS reference for qualified, I just looked into the C11 standard. You are right, they use that name, but never explicitly mention that it is the `*` operator. – too honest for this site Aug 28 '15 at 23:44

3 Answers3

6

On Intel-based platforms, numbers are stored in little endian order. The least significant byte is first, the most significant last. This is the opposite of how we conventionally read numbers. If we wrote numbers in little endian instead of big endian order, one thousand twenty three would be written 3201 instead of 1023.

When you interpret the bytes in the byte array as a 16-bit integer, the first byte (170) is interpreted as the least significant byte and the second byte (85) is the most significant. But when you print the bytes yourself, you print them in the opposite order. That's where the mismatch is coming from.

Endianness is a platform-specific property. Most non-Intel architectures use the more "natural" big endian order. Unfortunately for us, Intel-based architectures are the most common. As it happens, almost all network traffic is big endian, also known as "network byte order". When Intel-based machines talk on the Internet they do a lot of byte swapping during both sending and receiving of data.

I expected this missmatch to happen if I print that uint16_t itself. What I don't understand is why it happens when I try to get its bits.

Reading its bits with bit masking and shifting operations doesn't read the physical bits in memory from left-to-right, it reads the logical bits from most-to-least significant. On a little endian architecture, most-to-least significant equates to right-to-left order.

Also note that endianness means the bytes are swapped, not the bits. Bits aren't swapped in little endian architectures, bytes are. Bits can't be swapped because they're not individually addressable. You can only get at them with shifts and masks.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
1

The possibly aligment error and missing format string length modifiers left aside, you are subject to an endianess problem. This term describes how datatypes longer than the smallest addressable unit (i.e. a byte) are stored in memory.

Your system appears to use little endian for 16 bit integers: the lower byte is stored at the lower address.

Note there is no reason to cast for the latter two for-loops, as you are using the same type as the array-elements. Never cast without good reason and always try to code that you do not have to cast. Casting prevents the compiler from helping you detect type-missmatch. So you only cast if you are absolutely sure you know better than the compiler what you are doing.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
1

Two bytes in memory:

0xAA

0x55

When they are interpreted as a 16 bit word, there are two possible values.

Based on processor's byte order:

Little Endian (least significant byte first): 0x55AA // Intel x86/x64

Big Endian (most significant byte first): 0xAA55 // Power, ARM, etc.

Garland
  • 911
  • 7
  • 22