9

Is there any SSE2 instruction to load a 128 bit int vector register from an int buffer, in reverse order ?

Paul R
  • 208,748
  • 37
  • 389
  • 560
Andy
  • 157
  • 1
  • 6

2 Answers2

11

It's quite easy to reverse 32 bit int elements after a normal load:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b

You can do the same thing for 16 bit short elements, but it takes more instructions:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b
v = _mm_shufflelo_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFLW - mask = 10 11 00 01 = 0xb1
v = _mm_shufflehi_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFHW - mask = 10 11 00 01 = 0xb1

Note that you can do this with fewer instructions using _mm_shuffle_epi8 (PSHUFB), if SSSE3 is available:

const __m128i vm = _mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1);
                                     // initialise vector mask for use with PSHUFB
                                     // NB: do this once, outside any processing loop
...
__m128i v = _mm_load_si128(buff);    // MOVDQA
v = _mm_shuffle_epi8(v, vm);         // PSHUFB
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Thanks Paul.Your logic is working fine.But I couldnt understand the usage of the second parameter "0x1B". Is it some sort of mask? Another doubt is ..Is it possible to do the same operation on shorts? – Andy May 16 '13 at 10:40
  • I've added a second example for loading and reversing shorts. The mask is covered in the Intel docs but I've added comments to show how it is constructed. – Paul R May 16 '13 at 11:47
  • P.S. I highly recommend downloading the [Intel Intrinsics Guide](http://software.intel.com/en-us/articles/intel-intrinsics-guide) - a very useful tool for WIN/Mac OS X/Linux which documents all the SSE/AVX instructions and intrinsics in a very accessible way. – Paul R May 16 '13 at 11:53
  • 2
    I would use PSHUFB for reversing a vector of shorts, unless SSSE3 isn’t available. – Stephen Canon May 21 '13 at 13:52
  • Sure, but the OP specifically asked for SSE2 solutions. I'll add a note to the answer though. – Paul R May 21 '13 at 15:33
  • SSE3, SSSE3, SSE4.1 and SSE4.2 all are supported. As far as the usage of _mm_shuffle_epi8 (PSHUFB) is concerned, I am not exactly able to figure out the usage of the mask.Can someone pls explain? – Andy May 27 '13 at 07:57
  • 1
    OK - I've added a `PSHUFB` example above for reversing the order of 16 bit ints in a vector. – Paul R May 27 '13 at 08:28
  • 1
    Thanks._mm_shuffle_epi8 now seems to make sense to me. I am a novice in Intel intrinsic programming(although I have worked with NEON intrinsics) and initially it seemed to me that there are no straightforward instructions in SSE to accomplish certain functionalities.But now it looks most operations are possible with the provided intruction sets combined with the correct logic :-) – Andy May 27 '13 at 09:13
  • Yes, that's true - there are quite a few "tricks of the trade" that you need to learn to get the best out of SIMD in general, and SSE in particular. – Paul R May 27 '13 at 15:18
  • @Paul..Are there any tutorials or papers which can help me in learning a few "tricks of the trade" as well.Pls suggest. – Andy May 28 '13 at 07:33
  • Unfortunately there is not much out there - the best thing you can do is read and understand any existing code that you can find, e.g. in open source codebases, and also of course by writing your own optimised SIMD routines. – Paul R May 28 '13 at 08:17
-2

EDIT: (The following is for single precision floating point scalars, leaving it here just in case)

The most approximate (and handy) is _mm_loadr_ps intrinsic. Be aware the address must be 16byte aligned.

Although this intrinsic translates to more than instruction (MOVAPS + shuffling).

Trax
  • 1,890
  • 12
  • 15
  • Thanks for the reply but this instruction loads four single-precision, floating-point values in reverse order.I am looking for the same operation for integers but I guess there is no support for that. – Andy May 16 '13 at 10:13
  • Yes I didn't notice you were talking about integer values (should have re-read your title). Paul R answer is what you need. – Trax May 16 '13 at 10:50
  • Yes.Just curious, can the same operation be done with short values ? – Andy May 16 '13 at 11:12