5

I'm trying to cast unsigned short array to __m128i:

const unsigned short x[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
const unsigned short y[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};

__m128i n = *(__m128i*) &y[0];
__m128i m = *(__m128i*) &x[0];

First casting work fine, but the second one - not. I've got:

Unhandled exception at 0x013839ee in sse2_test.exe: 0xC0000005: Access violation reading location 0xffffffff.

What's wrong? Can somebody help me?

Mysticial
  • 464,885
  • 45
  • 335
  • 332
stack_user
  • 59
  • 1
  • 2

2 Answers2

12

Watch your data alignment.

When you dereference a __m128i* or any other SSE type, the pointer is required to be aligned to 16 bytes. However, x and y are not guaranteed to be aligned to 16 bytes.

Enforcing alignment is dependent on the compiler.

Visual C++

__declspec(align(16)) const unsigned short x[] = ...

GCC

const unsigned short x[] __attribute__((aligned(16))) = ...

Alternatively, you can use unaligned loads (abeit at a possible performance penalty):

__m128i n = __mm_loadu_si128((__m128i*) &y[0]);
__m128i m = __mm_loadu_si128((__m128i*) &x[0]);
finnan
  • 330
  • 2
  • 7
Mysticial
  • 464,885
  • 45
  • 335
  • 332
1

You shouldn't blindly cast one pointer type to another one, as Mystical says you should expect alignment problems, then. C11 has _Alignas and other compilers have extension to C99 or C89 to do the same thing.

The official, and as I find clearest, method to such a thing with C99 is to create a union:

union combine {
  unsigned short x[sizeof(__m128i)/sizeof(unsigned short)];
  __m128i y;
}

union combine X = { .x = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} };

Such a union is guaranteed to be correctly aligned for all its members. Now you easily can use X.y and you don't even have to go through pointer references.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • This is the canonical way recommended by the gcc folks. Unfortunately it tends to produce inferior code (to much register spilling). BTW, `unsigned short x[sizeof(__m128i)]` doesn't make any sense. Either `sizeof(__m128i)/sizeof(short)` to get the numbers of shorts fitting into a `__m128i` or simply `16` to match the number of elements given. – Gunther Piez Jul 20 '12 at 07:52
  • @drhirsch, thanks for spotting the error, corrected. Though I have some serious doubts that this should be `16`. This is `16` bytes isn't it and not `16` `short`, no? And so the initializer (also in the question) would be just wrong, wouldn't it? – Jens Gustedt Jul 20 '12 at 08:12
  • Yes, thats what I meant. See my comment at the question. – Gunther Piez Jul 20 '12 at 09:13