0

I am converting a vectorized code from SSE2 intrinsics to AVX2 intrinsics, and would like to know how to check if a 256i (16-bit) vector contains any element greater than zero or not. Below is the code used in the SSE2:

int check2(__m128i vector1, __m128i vector2)
{
  __m128i vcmp =  _mm_cmplt_epi16(vector2, vector1);
  int cmp = _mm_movemask_epi8(vcmp);
  return ((cmp>0) ? 1 : 0) ;
}

I thought that the following code will work, bit it didn't.

int check2(__m256i vector1, __m256i vector2)
{
  __m256i vcmp = _mm256_cmpgt_epi16(vector1, vector2);
  int cmp = _mm256_movemask_epi8(vcmp);
  return ((cmp>0) ? 1 : 0) ;
}

I would be thankful if somebody can advise

MROF
  • 147
  • 1
  • 3
  • 9
  • 2
    Can you explain how it "doesn't work"? – Mysticial Feb 23 '15 at 23:22
  • It does not return the right answer as in the SSE2 code, I doubt that the problem is related to the _mm256_movemask_epi8 function. Maybe it should be replaced with another function, shouldn't it? – MROF Feb 23 '15 at 23:29
  • 2
    Note that `gt` is not the complement of `lt`. Why did you change the order of the parameters? – user3386109 Feb 23 '15 at 23:39
  • There is no built in instruction based on LT (_mm256_cmplt_epi16). However, using the GT with exchanging the parameters should return the same result. – MROF Feb 23 '15 at 23:49
  • 2
    I'm sensing an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here. If you're comparing against zero, why does the function take two vectors? – Mysticial Feb 23 '15 at 23:59
  • @MROF Technically, you need GTE (greater than or equal) to give the same results with the parameters switched. – user3386109 Feb 24 '15 at 00:15

1 Answers1

1

I think you just have a trivial bug - your function should be:

int check2(__m256i vector1, __m256i vector2)
{
    __m256i vcmp = _mm256_cmpgt_epi16(vector1, vector2);
    int cmp = _mm256_movemask_epi8(vcmp);
    return cmp != 0;
}

The problem is that _mm256_movemask_epi8 returns 32 bit flags as a signed int, and you were testing this for > 0. Obviously if the MS bit is 1 then this test will fail (since the result will be < 0). You did not see this problem with the SSE version because it only returns 16 bits.

Paul R
  • 208,748
  • 37
  • 389
  • 560