What are the problems connected to data alignment on different architectures?

Question

Following this comment to one of my previous questions, I was convinced that defining a struct, with fields having an appropriate type with a know and well defined size, and feeding an instance of this struct to read, was enough to read data from a stream in a safe way.

What is the piece of the puzzle that I'm missing ? My struct represents the internal definition of the header that I'm trying to read from the file, what are the possible problems that could arise and what are the weak points of this simple design choice ?

If you have an integer, with a value of `0xABCDEF` on one system, if you sent that value by some means without taking into account the endianess of both systems, the integer you received could be `0xEFCDAB`. It might not be, if you use the same architecture on both systems, which doesn't always happen — A Person, Jan 12 '14 at 21:48
The amount of padding received by structs to align to word boundaries is system dependent. You could pack structs, but that might cause problems as well, for systems that need alignment — A Person, Jan 12 '14 at 21:54

score 2 · Answer 1 · answered Jan 12 '14 at 21:45

2

First and foremost, consider byte ordering (aka endianness). What happens if the file's data was written in a little endian environment and you are now reading it in a big endian architecture? Everything will be a mess.

Apart from that, always remember that holes or padding may appear between any two consecutive components or after the last component in the layout of a structure if necessary to allow proper alignment of components in memory. The bad thing is that this is platform dependent, so you can't portably write code like the one you described. Your struct may have different padding and holes in different machines, so you will get arbitrary and different behavior when you read this file in different machines. It will (wrongly) fill out padding bits with file's data. It is not portable code, and it is certainly not elegant to read, since I imagine this involves a lot of ugly casts.

So, as the mentioned comment says, you can't really expect to read a file into a structure and have it work - and if you did that and never had any problems, it was sheer luck. You really can't rely on this. Different architectures have different alignment requirements, it is highly platform dependent.

answered Jan 12 '14 at 21:45

Filipe Gonçalves

20,783
6
53
70

ok, so the only solution is to use a type of a know size, let's say a `char` that is 8 bit ( and maybe `typedef` it so on other platforms I can easily change this ), and if I have to read an header of 512 bytes, I use a vector of 512 `char`, `char a[512]`, and than I iterate through this array according to the endianess of the platform ? Isn't this slow ? I also have to copy the right bits in each field. – user2485710 Jan 12 '14 at 21:51
@user2485710 There is no general purpose bright solution, it depends on the context and on what you want to achieve. You can read the header into a char array, sure, but what do you want to do with it? Split it into blocks? Maybe you could instead declare an array of structures and read block by block into its structure, so you don't have to read to a chars array and copy to structures? It depends on what you want to do... – Filipe Gonçalves Jan 12 '14 at 21:57
"It depends on what you want to do" meaning what ? There is a different approach for networking and files ? What is the property that makes the difference ? – user2485710 Jan 12 '14 at 22:01
Well, you can adopt some ideas from the networking world - for example, data transfers normally take place with big-endian byte ordering; this is universally accepted. The point is: you must develop code that agrees to a common data pattern and match it with your file. If you choose to read data in big-endian format, you must be sure to generate files in big-endian format. And so forth... In other words, you must worry about this, you can't rely on the platform to be immutable over time - develop code that deals with it, that's basically it. – Filipe Gonçalves Jan 12 '14 at 22:07

Matteo Italia · Answer 2 · 2014-01-12T22:06:38.497

There are several problems that can arise:

the definition of fundamental types on different architectures may be different. Suppose you have a struct like this:
```
struct MyStruct
{
    char c[9];
    int a;
    long b;
};
```
Compiled on almost any 32 bit or 64 bit compiler on Windows you'll need 9 bytes for c, 4 for a and 4 again for b. OTOH, on 64 bit Linux long is normally 8 bytes, so the struct as understood by gcc on Linux 64 bit is remarkably different;
changes in the definition of the struct, artchitectural considerations and the compiler mood may affect padding; in MyStruct above a 32 bit compiler will typically introduce 3 bytes of padding after c to align a to 4 bytes boundaries, and a 64 bit compiler may want to add extra padding to align stuff to 8 bytes boundaries;
depending from architecture, the internal representation of integers may have different endianness, so, even if integer size and padding matches, the bytes of the integers read from file may have to be swapped to be meaningful.

All these problems are solved by specifying exactly these areas of ambiguity: for an on-disk format you should use:

fixed length types (int32_t for a signed 32-bit integer, uint64_t for an unsigned 64-bit integer, ...);
well-determined padding - if any; almost any compiler provides some #pragma or other mean to control alignment and padding with precision;
fixed endianness; you decide some endianness setting (big-endian if you like the "network order" choice of TCP/IP & co., little-endian if you are more practical and you want to postpone the problem until you'll ever need to interoperate with a big-endian device) and setup your code to swap the bytes accordingly if the code is compiled on a machine with endianness different than the chosen one for the on-disk format.

Notice that, since endianness and padding may be clumsy to work around when dumping structures, you may be better off serializing the single fields (applying the necessary endian transformations) without padding instead of dumping whole structs.

For a nice C++-ish way of solving the "binary serialization problem", I suggest you to have a look to Qt's QDataStream class and related stuff. They provide an operator<< to QDataStream to serialize primitive types (with a strong suggestion to use their fixed-width types), with no padding and by default in big-endian format; then you can provide operator<< and operator>> for your classes (perhaps including some kind of versioning), allowing each of your class to deal just with its fields.

Depending on the desired level of portability, adding the use of pack pragmas/attributes and adding using a certain integer representation (e.g. 2nds complement) should be added to the list. And similar for floating point types. — PlasmaHH, Jan 12 '14 at 21:58
you started your post exactly how my last question ended; assuming that I know the size of the basic types used to define the struct, for example it's a `struct` with all `char` and `int32_t`, and I know that a `char` is 8 bit, now on the last comment the user outlines the fact that even the fact that I know the size of my types, it's not enough, so I'm wondering, the only problem left is the endianess ? — user2485710, Jan 12 '14 at 22:00
@PlasmaHH: I would assume that 2's complement is a given; as for FP types, although I never saw any "normal" platform using anything but IEEE 754, it's still worth keeping in mind when dealing with bizarre platforms. — Matteo Italia, Jan 12 '14 at 22:02
@user2485710: as I said, besides endianness the problem left is padding, which is compiler and architecture dependent. — Matteo Italia, Jan 12 '14 at 22:03
@MatteoItalia: as I said, it depends on the level of portability desired. C++ allows for other representations than 2nds complement, and if you want to see other binary floats than IEEE, just have a look at VAX. — PlasmaHH, Jan 13 '14 at 09:09

barak manos · Answer 3 · 2014-02-02T06:14:06.903

There are two basic rules that you need to follow:

Every instance of your structure must be located at a memory address which is divisible by the size of the largest field in the structure.
Each field in your structure must be located at an offset (within the structure) which is divisible by the size of that field itself.

For example, every instance of the following structure must reside in a memory address which is divisible by sizeof(uint32):

struct
{
    uint16 a; // offset 0 (OK, because 0 is divisible by sizeof(uint16))
    uint08 b; // offset 2 (OK, because 2 is divisible by sizeof(uint08))
    uint08 c; // offset 3 (OK, because 3 is divisible by sizeof(uint08))
    uint32 d; // offset 4 (OK, because 4 is divisible by sizeof(uint32))
}

Exceptions:

Rule #1 may be violated if the CPU architecture supports unaligned load and store operations. Nevertheless, such operations are usually less efficient (requiring the compiler to add NOPs "in between"). Ideally, one should strive to follow rule #1 even if the compiler does support unaligned operations, and let the compiler know that the data is well aligned (using a dedicated #pragma), in order to allow the compiler to use aligned operations where possible.
Rule #2 may be violated if the compiler automatically generates the required padding. This, of course, changes the size of each instance of the structure. It is advisable to always use explicit padding (instead of relying on the current compiler, which may be replaced at some later point in time).

Supplemental:

These two rules are in essence the reflection of a single rule - every variable must be allocated at a memory address that is divisible by its size (1, 2, 4 or 8).

In most computer programs, the alignment problem emerges only when using structures.

But this is only because structure instances can more easily "fall into unaligned locations in memory", without generating any compilation warnings.

If we "try hard enough", then we can reproduce the same problem with simple variables. For example, in the code below, 3 out of 4 assignments will cause an unaligned memory access violation:

char arr[16];
int p0 = *(int*)(arr+0);
int p1 = *(int*)(arr+1);
int p2 = *(int*)(arr+2);
int p3 = *(int*)(arr+3);

score 1 · Answer 4 · answered Jan 12 '14 at 22:12

Padding is one thing that you need to consider. The other problem is that depending on architecture, accessing a misaligned pointer can either work good or crash your program.

For example, assume you have a char[12] array and want to store a 4-byte int and 8-byte double in it. It's tempting to do something like:

*((int*)&array[0]) = myInt;
*((double*)&array[4]) = myDouble;

And on your standard PC (x86 / x64), this code will work fine (albeit you might notice it's a bit slow). And then you port it to CUDA, for example, and it crashes. That's because (AFAIR) CUDA can't access memory that isn't properly aligned.

That's why structs have to be padded, so that every address is properly aligned. It does mean, though, that if you try to interpret such struct as a continuous region of bytes, you will end up encountering the padding bytes.

What are the problems connected to data alignment on different architectures?

4 Answers4

Supplemental: