Check for zeros horizontally across __m128i vector?

Question

I have several __m128i vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero.

I understand how I can "aggregate" the multiple __m128i vectors but eventually I will still end up with a single __m128i vector, which I will then need to check horizontally.

How do I perform the final horizontal check for zero across the last vector?

EDIT I am using Intel intrinsics, not inline assembly

Have you tried using the test intrinsics? Compare + test might do what you want. — Mysticial, Apr 21 '14 at 21:37

Stephen Canon · Answer 1 · 2014-04-21T21:53:32.340

5

Don’t do it. Avoid horizontal operation whenever possible; it is death to performance of vector code.

Instead, compare the vector to a vector of zeros, then use PMOVMSKB to get a mask in GPR. If that mask is non-zero, at least one of the lanes of your vector was zero:

__m128i yourVector;
__m128i zeroVector = _mm_set1_epi32(0);

if (_mm_movemask_epi8(_mm_cmpeq_epi32(yourVector,zeroVector))) {
    // at least one lane of your vector is zero.
}

You can also use PTEST if you want to assume SSE4.1.

Taking the question at face value, if you really did need to do a horizontal and for some reason, it would be movhlps + andps + shufps + andps. But don’t do that.

edited Apr 21 '14 at 21:53

answered Apr 21 '14 at 21:37

Stephen Canon

103,815
19
183
269

(PTEST isn't useful here since it will tell you whether *all* the lanes are zero, rather than whether *any* lane is zero.) – Raymond Chen Apr 21 '14 at 21:41
1

@RaymondChen But that can be easily inverted. – Mysticial Apr 21 '14 at 21:41
1

@RaymondChen: as Mystical notes, any lane zero is the same as !(all lanes non-zero), which PTEST can do. – Stephen Canon Apr 21 '14 at 21:42
Two questions: one could you elaborate on your solution? I am using intrinsics and have no idea what GPR is. Two could this approach of avoiding horizontal operations be applied to summation? I am summing across an array and I use multiple __m128i vectors. Each vector contains 4 "mini-sums" but eventually I need one sum value. I cannot see how I could end-up with one sum value unless I do a horizontal summation at the end? – user997112 Apr 21 '14 at 21:46
1

@user997112 No. Summing will require actually adding them up. There will not be horizontal reduction instructions for addition until AVX512. – Mysticial Apr 21 '14 at 21:48
@user997112: What mystical said. If you need to add, you need to add. Checking for zero is a much simpler operation. (But: do you really need to do horizontal summation? Is there someway you could modify your data layout to avoid doing it?) – Stephen Canon Apr 21 '14 at 21:50
2

@StephenCanon It sounds like this is a reduction of a larger vector. At the end you'll still have to reduce over a single vector. But in that case, it's probably not performance critical because it's O(1) of an O(N) operation. – Mysticial Apr 21 '14 at 21:51
@Mysticial: agreed; I just like to make sure. People tend to be horizontal-operation happy when they’re starting out writing vector code. – Stephen Canon Apr 21 '14 at 21:54
@StephenCanon Horizontal-happy is still better than set-happy: http://stackoverflow.com/a/23186488/922184 :D – Mysticial Apr 21 '14 at 21:56
@Mysticial: One would like to think that compilers would manage to optimize that particular horror away. One would like to think. – Stephen Canon Apr 21 '14 at 22:07
1

@StephenCanon if I write horizontal-vector code its always after the loop, once all the parallel processing has been done and I need to aggregate the results. – user997112 Apr 22 '14 at 01:31
@user997112: that's a perfectly appropriate usage of a horizontal operation. – Stephen Canon Apr 24 '14 at 01:54

Check for zeros horizontally across __m128i vector?

1 Answers1