2

I've cobbled together a neon equivalent to the SSE2 intrinsic _mm_shuffle_epi8.

The code I currently have for this purpose is:

        static __forceinline __n128 shuffle8(
            const __n128& a,
            __n128 b) throw()
        {
            __n64x2 in =
            {
                a.DUMMYNEONSTRUCT.low64,
                a.DUMMYNEONSTRUCT.high64
            };
            b.DUMMYNEONSTRUCT.low64 = vtbl2_u8(in, b.DUMMYNEONSTRUCT.low64);
            b.DUMMYNEONSTRUCT.high64 = vtbl2_u8(in, b.DUMMYNEONSTRUCT.high64);
            return b;
        }

Now, I'm not necessarily set on this being the final form of things; but that's not the question yet. I've been testing my code and have found that what I've given works exactly as I intend it to when building/running in debug mode, but NOT when building/running in release mode. By way of example:

#define simd_shuffle8(a, b) shuffle8(a, b)

...

simd test = keyschedule[1];
test = simd_shuffle8(test, test);

keyschedule[1] has an initial value of

{0x858efc16, 0x8801f2e2, 0x1f0fb923, 0x11ecb78e}

In debug mode, test ends with a value of

{0x00000000, 0x00fc0000, 0x00110000, 0x00000000}

which is as it should be. In release mode, test ends with a value of

{0x16161616, 0x16001616, 0x16161616, 0x16001616}

which is not as it should be. What is likely to be causing this issue/how might I fix it?

MNagy
  • 423
  • 7
  • 20

1 Answers1

0

As it so happens, after testing a bit I found that assigning the low64 and high64 DUMMYNEONSTRUCT values separately seems to be the part that causes problems --interestingly, the output changes depending on the order in which I assign them. Looks like a bug in MSVS so far as I can tell. Anyway, to anyone who's interested, I got around this by returning a vcombine. That produced the correct output.

MNagy
  • 423
  • 7
  • 20