What alignment issues limit the use of a block of memory created by malloc?

Question

I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it.

My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this:

void *allocateScratch(size_t rows, size_t columns);
void doCalculation(size_t rows, size_t columns, double *data, void *scratch);

The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required.

In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types.

So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this:

void *allocateScratch(...) {
    return malloc(3 * sizeof(double) + 5 * sizeof(int));
}

void doCalculation(..., void *scratch) {
    double *dblArray = scratch;
    int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double);
}

Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider?

My biggest goals are as follows:

I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required.
I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way.
The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility.

Maybe this issue is addressed in the C standard, but I haven't been able to find it.

The first example is ok, in the second you will have to pad to the next nearest address divisible by sizeof( double ). — this, Jan 15 '14 at 06:51
Yeah, that's sort of what I expected. Is that guaranteed to be acceptable though? It seems that trying to ensure correct alignment manually may not be worth the effort. — Jeremy West, Jan 15 '14 at 06:53
Yes, it is correct. In that way you will waste memory of only the last member. If you use an union you will waste for every member. Just make sure you know where int ends and double begins. — this, Jan 15 '14 at 06:57
(note: you can't perform arithmetic on `void*`, so in your `scratch + N*sizeof(double)` you should cast scratch to `char*`, or to `double*` and then adding only N, and finally re-cast it to `int*` — ShinTakezou, Jan 15 '14 at 07:23
@ShinTakezou Oh, yes, good catch! Obviously, I didn't compile my example code :). — Jeremy West, Jan 15 '14 at 07:24
The question mentions "iterations of the same type" generally malloc would be applied to each discrete item. With proper data structure it seems like these should not need memory specifically allocated with malloc to the range of items. I'd suggest revisiting the problem, and maybe look up strategies specific to data structures such as linked lists. — Jason K., Oct 09 '16 at 23:49
@Jason I probably didn't make the setting clear enough. The library does matrix computations, such as computing factorizations or solving a linear system. Many of the standard algorithms require extra temporary space. The applications that use the library tend to perform many similar calculations on matrices of the same size. The scratch space needed is the same for matrices of the same size, so the idea is to reuse the space across multiple calls. — Jeremy West, Oct 18 '16 at 19:50

score 4 · Answer 1 · edited May 23 '17 at 11:48

The memory of a single malloc can be partitioned for use in multiple arrays as shown below.

Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this:

size_t Offset = 0;

ptrdiff_t OffsetA = Offset;           // Put array at current offset.
Offset += NA * sizeof(A);             // Move offset to end of array.

Offset = RoundUp(Offset, sizeof(B));  // Align sufficiently for type.
ptrdiff_t OffsetB = Offset;           // Put array at current offset.
Offset += NB * sizeof(B);             // Move offset to end of array.

Offset = RoundUp(Offset, sizeof(C));  // Align sufficiently for type.
ptrdiff_t OffsetC = Offset;           // Put array at current offset.
Offset += NC * sizeof(C);             // Move offset to end of array.

unsigned char *Memory = malloc(Offset);  // Allocate memory.

// Set pointers for arrays.
A *pA = Memory + OffsetA;
B *pB = Memory + OffsetB;
C *pC = Memory + OffsetC;

where RoundUp is:

// Return Offset rounded up to a multiple of Size.
size_t RoundUp(size_t Offset, size_t Size)
{
    size_t x = Offset + Size - 1;
    return x - x % Size;
}

This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.

Basile Starynkevitch · Answer 2 · 2014-01-15T12:06:54.963

The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header).

GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment.

Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions...

And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes).

On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes...

Don't forget that according to Fulton

there is no such thing as portable software, only software that has been ported.

but intptr_t and max_align_t is here to help you....

So, to clarify, you're suggesting that as long as I split my scratch space into blocks that are multiples of `sizeof(max_align_t)`, I'm fine? — Jeremy West, Jan 15 '14 at 06:35
Read the question carefully... He wants one block of memory, some of which will hold (e.g.) a sequence of `double`, and some of which will hold (e.g.) a sequence of `int`. None of your observations help in this case. — Nemo, Jan 15 '14 at 06:35
+1 for revealing that c11 has max_align_t. It is defined with a hack. — this, Jan 15 '14 at 07:02

score 2 · Accepted Answer · answered Jan 15 '14 at 06:38

2

If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design.

So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to.

If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those...

Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?

answered Jan 15 '14 at 06:38

Nemo

70,042
10
116
153

I considered the union option, but it does seem a bit wasteful. In the end, I'll probably go with the "struct of pointers" solution because it is definitely correct (and clear). I'm probably working too hard to avoid multiple allocations. – Jeremy West Jan 15 '14 at 06:43
likely. Moreover, in ANSI C there's no `intptr_t` and to perform alignment "by hand" of the second block you would have needed to cast it into the proper integral type, producing not-so-portable code. – ShinTakezou Jan 15 '14 at 07:21
In this case, I'd expect the struct to contain only items which are often if not always used. – Jason K. Oct 09 '16 at 23:44

score 1 · Answer 4 · answered Jan 15 '14 at 06:38

1

Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).

answered Jan 15 '14 at 06:38

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Is that guaranteed to be safe? Or just likely? – Jeremy West Jan 15 '14 at 06:46
1

@BasileStarynkevitch: How do you figure it is just very likely? The size of an object **must** be a multiple of its alignment requirement (due to C‘s requirements for arrays). Therefore, if the offset from the start of the allocation (which is guaranteed to be suitably aligned for any object with a fundamental alignment requirement) is a multiple of the size of an object, the address is suitably aligned for the object. – Eric Postpischil Jan 15 '14 at 11:22
It is related to the [ABI](http://en.wikipedia.org/wiki/Application_binary_interface) specification of your particular implementation. – Basile Starynkevitch Jan 15 '14 at 11:42
@BasileStarynkevitch: I do not see how that answers the question. Please show an example where aligning an object to a multiple of its size would not produce the alignment required for the object. – Eric Postpischil Jan 15 '14 at 14:59
Suppose my claim is false. Then if you just used the whole malloc'd block as an array of the thpe in question, the n'th element of the array, where n is the multiple of the type size mentioned in my claim, would be misaligned. This is false, so my claim is true. – R.. GitHub STOP HELPING ICE Jan 15 '14 at 19:19
@EricPostpischil: There is a case where it would make sense for an object's reported size to *not* be a multiple of its alignment: structures with flexible array members. Unfortunately, because C99 didn't add flexible array members until other implementations had done so, and implementations did so in different ways, the Standard is annoyingly vague about what "sizeof" really means on a structure containing a FAM. – supercat Nov 26 '15 at 23:29
@R..: Would anything forbid an implementation from requiring that arrays with an even size be aligned more coarsely than single objects or arrays of odd size? Some platforms have a "char" which is smaller than the basic addressing unit but have instructions to access the upper or lower part of a word. Imposing word alignment on even-sized arrays could greatly improve the efficiency of code that accesses them. – supercat Jul 15 '16 at 22:33
@supercat: An implementation could choose to align all arrays beyond the nominal alignment of the element type, but it could only assume this alignment in contexts where the pointer used to access the array members can be traced back to the array object declaration. This is because, given `char array[N];`, `array+1` is a valid pointer to the first element of an array of `N-1` elements, which would **necessarily** not be aligned the same as the implementation's over-aligned arrays. For this reason what you're asking about is likely a useless "optimization". – R.. GitHub STOP HELPING ICE Jul 16 '16 at 02:22
@R..: Does the Standard allow an arbitrary pointer of type `int*` to be cast to a pointer-to-int-array type if it is not a pointer to the start of such an array or the start of an allocated region? Is it legitimate for code to use two pointers to overlapping array objects in ways that alias? – supercat Jul 18 '16 at 03:06
@R..: From an optimization standpoint, it could be very useful for a compiler to know that when a function which receives pointers to two arrays the pointers will either match or be totally disjoint; since code that wants to merely pass a pointer to something that can be treated as a sequence of (e.g.) `int` values but isn't necessary an entire five-element array would typically pass an `int*` rather than an `int(*)[5]`, I don't know what kinds of pointer casts the Standard requires implementations to support. – supercat Jul 18 '16 at 15:06
@supercat: I find your "very useful" claim dubious simply because almost nobody uses pointer-to-array types; the amount of code that could be optimized by this is tiny compared to the potential conformance issues and potential wrong-optimization-bug issues that could arise. – R.. GitHub STOP HELPING ICE Jul 18 '16 at 15:47
@R..: The optimization would only be helpful with code written explicitly to take advantage of it, but there are a variety of platforms and kinds of application where it could offer a significant performance boost if the Standard would allow it, and I don't know that a whole lot of code uses pointer-to-array types in ways that would be incompatible with the indicated optimization that wouldn't be equally flummoxed by aspects that would be allowable, and which the Standard seems to recognize as possibilities. – supercat Jul 18 '16 at 20:04
@supercat: An easier, portable way to write code to take advantage of alignment assumptions is `if ((uintptr_t)p % align) 1/0;` – R.. GitHub STOP HELPING ICE Jul 18 '16 at 22:02
@R..: If applying 128-bit alignment to arrays of type `float[64]` would entitle a compiler to expect 128-bit alignment from pointers of type `float(*)[64]`, then declaring a pointer of that type would be a portable way to let a compiler that was could exploit such alignment to make use of it, while remaining compatible with compilers that don't align arrays in that fashion. Although for many purposes it would be more helpful to have the Standard extend the the common-initial-sequence rule so that any structure of the form e.g. `struct foo32 {int size; int dat[32];}` which was present in a... – supercat Jul 19 '16 at 13:46
...visible union declaration with a `struct fooAny {int size; int dat[];}` could be guaranteed to be layout-compatible and alias-compatible with the latter, the Standard explicitly refrains from requiring that both `dat[]` types have the same offset and I can't see much point to it doing so *other* than to vary the alignment. Further, given `for (int i=0; i<64; i++) arrayPtr1[i] = arrayPtr2[i]*0.75f;`, knowing that the pointers will either be equal or disjoint would allow a compiler to vectorize the computation; `restrict` would allow vectorizing but would require special-case code... – supercat Jul 19 '16 at 13:51
...to handle the matching-pointer scenario. Using a `float*` would require a compiler to generate code that would make reads from later elements of `arrayPtr2[i]` see any preceding writes to those elements made via `arrayPtr1[i]`. If two pointers of type `float(*)[64]` could be guaranteed not to *partially* overlap, that would let a compiler process the elements in arbitrary sequence. – supercat Jul 19 '16 at 13:56
@supercat: I've been asking around and the general attitude seems to be that there's no such guarantee but it's not clear. In any case you'd need a new ABI to do what you're asking for, so it's not really viable for existing platforms. – R.. GitHub STOP HELPING ICE Jul 19 '16 at 14:46
@R..: The question is whether anything in the Standard would forbid compilers from applying alignment rules as described. Most existing platforms have ABIs that describe structure layouts which don't apply such alignments, but nothing in the Standard requires that compilers for a CPU abide by normal ABI conventions for that CPU. – supercat Jul 19 '16 at 16:28
@supercat: Of course not, but a compiler that doesn't is largely a toy. – R.. GitHub STOP HELPING ICE Jul 19 '16 at 17:23
@R..: Not necessarily. If such a compiler could run programs for it twice as fast as would be possible for a compiler that had to run code not targeted toward it, it could be very useful for when speed was important and where programs would be run enough to justify the effort of tweaking them to fit the implementations' requirements. Such an implementation should not be considered "normal", but nor IMHO should an implementation which, while running on 32-bit two's-complement silent-wraparound hardware, can negate laws of time and causality when multiplying two values of type uint16_t. – supercat Jul 20 '16 at 14:09
@supercat: But it can't run code twice as fast. Modern compilers, when they could optimize for the aligned case but can't guarantee alignment, just make a branch to check alignment and emit two or more versions of the code. Nobody is going to change ABIs every time there's an opportunity to optimize out one of these (essentially 100% predictable) branches. – R.. GitHub STOP HELPING ICE Jul 20 '16 at 16:21

What alignment issues limit the use of a block of memory created by malloc?

4 Answers4

Linked