2

I need to optimize my matrix multiplication by using SIMD/Intel SSE. The example code given looks like:

*x = (float*)memalign(16, size * sizeof(float));

However, I am using C++ and [found that][1] I instead of malloc (before doing SIMD), I should use new. Now, I'm further optimizing via SIMD/SSE, so I need aligned memory, so question is: do I need memalign/_aligned_malloc or is my array declared like

static float m1[SIZE][SIZE];

already aligned? (SIZE is an int)

Jiew Meng
  • 84,767
  • 185
  • 495
  • 805

1 Answers1

5

Typically, they would not be 16-byte aligned, although there is nothing in the C++ specification that would prevent your compiler from aligning such an array on a 16-byte boundary. Depending upon what compiler you're using, there is usually a compiler-specific way to request that the array be aligned on a 16-byte boundary. For example, for gcc, you would use:

static float m1[SIZE][SIZE] __attribute__((aligned(16)));

Alternatively, you could use posix_memalign(), memalign(), or other aligned-allocation APIs available on your platform to get a block of memory with the desired alignment. As a worst case, you could even allocate memory using standard malloc() or operator new and then handle the alignment adjustment yourself.

Jason R
  • 11,159
  • 6
  • 50
  • 81
  • I'm using `g++` its the same I suppose? I mean `gcc` is for C and `g++` is for C++? Since I am using C++ then I do that for `g++`? I'll try that. – Jiew Meng Oct 03 '12 at 13:50
  • The `gcc` front end actually supports both C and C++. It will switch modes based upon the extension of the source file (i.e. it will expect a `.c` file to be C and `.cc` or `.cpp` files to be C++). If you want to be C++-explicit, then you can invoke it as `g++` instead. – Jason R Oct 03 '12 at 13:56
  • Note also that SIZE will need to be a multiple of 4 if you want each array row to be aligned (I assume that both you and the OP are aware of this - I just add it for any future readers of this question). – Paul R Oct 03 '12 at 14:51
  • Hmm, now I need to allocate memory on the heap instead ... I think I will need to convert to `float* m1 = new float[SIZE*SIZE]` how does the `__attribute__((aligned(16)))` part fit in now? – Jiew Meng Oct 04 '12 at 11:27
  • If you're allocating from the heap, you would use one of the `memalign()`-like functions, like the ones you noted in your original question. You wouldn't want to use `operator new` in this case, unless you've overridden it to allocate the underlying memory using one of the aligned facilities. – Jason R Oct 04 '12 at 12:34