2

Is there an alternative to the following manual fix-up:

// excerpt adapted from SIMDTest in   
// http://www.mccauslandcenter.sc.edu/mricro/obsolete/graphics/simdtest.zip
//
var
  lAdblRAp, lArraySz, lAdblRA, Doublep: LongInt;
begin
  // ...
  GetMem(lAdblRAp,(lArraySz * SizeOf(Double)) + 32);
  lAdblRA := Doublep((Integer(lAdblRAp) and $FFFFFFF0) + 16);
  // ...
end;

Notice that this piece of code is embbeded either in a procedure or in a function.

menjaraz
  • 7,551
  • 4
  • 41
  • 81

2 Answers2

3

The standard way is to use a memory manager that will align blocks on 16 byte boundaries. FastMM will do this but you need the full version to be able to configure this option.

Note also that the code in your question is not 64 bit ready since it casts a pointer to a 4 byte integer.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
2

If you are using new versions of Delphi (I have tested with XE and XE2), the best and easiest way is to call SetMinimumBlockAlignment(mba16Byte) at the first place in your code.

Then call the regular GetMem, New or any memory allocation function and be sure the address is aligned to 16 bytes boundaries

Edit:

Also if you prefer to use manual fix-up, the best efficient way that wastes less memory is as followings:

var
  lArraySz: LongInt;
  lAdblRAp, lAdblRA: Pointer;     

begin
  // ...
  GetMem(lAdblRAp,(lArraySz * SizeOf(Double)) + 16);
  lAdblRA := Pointer((Integer(lAdblRAp) + 15) and $FFFFFFF0));
  // ...
end;

It will use 16 bytes less for every allocations.

Vahid Nasehi
  • 455
  • 5
  • 9
  • +1. Quote from Embarcadero RAD Studio help note on *SetMinimumBlockAlignment*: `Memory allocated through the Memory Manager is guaranteed to be aligned to at least 8-byte boundaries. 16-byte alignment is useful when memory blocks will be manipulated using SSE instructions, but may increase the memory usage overhead`. – menjaraz Feb 23 '12 at 10:56
  • @menjaraz: If you do not need to 16-byte alignment anymore in your code you can return it back to 8-byte alignment with `SetMinimumBlockAlignment(mba8Byte)`. It will affect only newly allocated memories. – Vahid Nasehi Feb 23 '12 at 11:02
  • @menjaraz: I should also mention that you may waste some memory in your manual fix-up also. But If you set block alignment back to 8-byte alignment, the memory manager will reuse possible memory wastes in prior 16-byte aligned memory allocations. – Vahid Nasehi Feb 23 '12 at 11:08
  • You are right. Seeking to attain 16-byte alignment may incur some memory usage penalty either way. Nevertheless, doing it the manual fashion doesn't call for a revert action because the Memory Manager's minimum block alignment remains untouched. – menjaraz Feb 23 '12 at 11:23
  • Yes, but remember that setting alignment back to 8-byte, will reuse prior wasted memories which will not occur in manual fix-up. – Vahid Nasehi Feb 23 '12 at 11:34
  • Memory reuse is the job of the Memory Manager. The memory acquired through `GetMem(lAdblRAp,(lArraySz * SizeOf(Double)) + 16);` should be released by an appropriate corresponding `FreeMem(lAdblRAp);` (you are supposed do the same to let (memory reuse take place) the Memory Manager do its job otherwise a memory leak will take place). – menjaraz Feb 23 '12 at 12:00
  • I do not mean memory reuse in manual fix-up method. The memory manager is not aware of your fix-up so it cannot use the wasted part of it. If you look at the Delphi memory manager source code, memory reuse only occurs in switching from 16-byte to 8-byte block alignment. – Vahid Nasehi Feb 23 '12 at 12:21
  • The wasted part is at most 32 bytes out of (lArraySz * SizeOf(Double)) + 32) bytes. The pointer variable lAdblRA also a waste a memory, it's useless in your solution and David's one. Albeit, memory usage is managed within the function/procedure. Everything goes out of scope after completion. – menjaraz Feb 23 '12 at 12:46
  • Then in your specific case memory usage overhead issue is no problem at all. You just need 16-byte align memory and you did it. – Vahid Nasehi Feb 23 '12 at 12:55
  • I forgot to attribute it, the code is not mine and I am seeking for other elegant possible solution (I'll make an Edit accordingly). The moral of it : I prefer taking advantage of MM functionality (Built-in or FastMM). I learn a lot from you. Thank you for your contribution. – menjaraz Feb 23 '12 at 13:32
  • I agree with you. I hope in the future there would be some advanced MM which would handle all of these issues in the best way. – Vahid Nasehi Feb 23 '12 at 21:43