I have several __m128i
vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero.
I understand how I can "aggregate" the multiple __m128i
vectors but eventually I will still end up with a single __m128i
vector, which I will then need to check horizontally.
How do I perform the final horizontal check for zero across the last vector?
EDIT I am using Intel intrinsics, not inline assembly