9

I am using SSE intrinsics to determine if a rectangle (defined by four int32 values) has changed:

__m128i oldRect; // contains old left, top, right, bottom packed to 128 bits
__m128i newRect; // contains new left, top, right, bottom packed to 128 bits

__m128i xor = _mm_xor_si128(oldRect, newRect);

At this point, the resulting xor value will be all zeros if the rectangle hasn't changed. What is then the most efficient way of determining that?

Currently I am doing so:

if (xor.m128i_u64[0] | xor.m128i_u64[1])
{
    // rectangle changed
}

But I assume there's a smarter way (possibly using some SSE instruction that I haven't found yet).

I am targeting SSE4.1 on x64 and I am coding C++ in Visual Studio 2013.

Edit: The question is not quite the same as Is an __m128i variable zero?, as that specifies "on SSE-2-and-earlier processors" (although Antonio did add an answer "for completeness" that addresses 4.1 some time after this question was posted and answered).

d7samurai
  • 3,086
  • 2
  • 30
  • 43
  • 1
    Why are you referring to a 128-bit integer value as `NULL`, which is a null *pointer* constant? – Keith Thompson Jan 12 '15 at 16:39
  • @KeithThompson `NULL` is a macro that expands to 0. C++ has `nullptr` as the null pointer constant. – d7samurai Jan 12 '15 at 16:42
  • 1
    `NULL` expands to an implementations defined C++ null pointer constant. It *could* expand to `nullptr`. Even if it happens to expand to `0`, it shouldn't be used as an integer. – Keith Thompson Jan 12 '15 at 16:51
  • @KeithThompson To quote Bjarne Stroustrup: "In C++, the definition of NULL is 0, so there is only an aesthetic difference. A problem with NULL is that people sometimes mistakenly believe that it is different from 0 and/or not an integer. In pre-standard code, NULL was/is sometimes defined to something unsuitable and therefore had/has to be avoided. That's less common these days." – d7samurai Jan 12 '15 at 16:56
  • 1
    Stroustrup is talking about using `NULL` or `0` as a null pointer constant. He doesn't advocate using `NULL` as an integer constant. Why would you write `NULL` rather than `0` anyway? In addition to style issues, `NULL` may legally be defined as `nullptr`, which is not an integer expression. – Keith Thompson Jan 12 '15 at 17:02
  • @KeithThompson No, he isn't: "..people sometimes mistakenly believe that [NULL] is different from 0 and/or not an integer." NULL is the SAME as writing 0. – d7samurai Jan 12 '15 at 17:04
  • @KeithThompson And the reason I chose `NULL` in the question title was because it is more discernible than a `0` there, it being the whole purpose of the question. I also wanted to hint at the fact that checking if an `__m128i` is `all zeros` is not as simple as just comparing it to a regular integer value. – d7samurai Jan 12 '15 at 17:06
  • 1
    `NULL` is for pointers -- and a null pointer is not necessarily all-bits-zero. I understand that you want something distinct from the `int` constant `0`, but `NULL` is confusing and potentially incorrect. How about the English word "zero"? – Keith Thompson Jan 12 '15 at 17:10
  • @KeithThompson What part of this are you not getting? NULL is SYNONYMOUS with 0, the integer. http://www.stroustrup.com/bs_faq2.html#null – d7samurai Jan 12 '15 at 17:12
  • 1
    Stroustrup refers to C++11 in the future tense. It's now the standard, and it says that `NULL` may be defined as `nullptr`, which is not an integer expression. Stroustrup's FAQ doesn't define the language; the standard does. In any case, `NULL` is and always has been intended to be used as a *pointer* value. Is `__m128i` a pointer type? – Keith Thompson Jan 12 '15 at 17:17
  • @KeithThompson From MSDN: "Avoid using NULL or zero (0) as a null pointer constant". http://msdn.microsoft.com/en-us/library/jj651642.aspx – d7samurai Jan 12 '15 at 17:24
  • @KeithThompson From the Standard: "Should I use NULL or 0 or nullptr? You should use `nullptr` as the null pointer value. The others still work for backward compatibility with older code." https://isocpp.org/wiki/faq/freestore-mgmt#null-or-zero – d7samurai Jan 12 '15 at 17:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68668/discussion-between-keith-thompson-and-d7samurai). – Keith Thompson Jan 12 '15 at 18:09
  • Possible duplicate of [Is an \_\_m128i variable zero?](https://stackoverflow.com/questions/7989897/is-an-m128i-variable-zero) – Antonio Apr 17 '19 at 00:59

2 Answers2

14

You can use the PTEST instuction via the _mm_testz_si128 intrinsic (SSE4.1), like this:

#include "smmintrin.h" // SSE4.1 header

if (!_mm_testz_si128(xor, xor))
{
    // rectangle has changed
}

Note that _mm_testz_si128 returns 1 if the bitwise AND of the two arguments is zero.

Paul R
  • 208,748
  • 37
  • 389
  • 560
6

Ironically, ptest instruction from SSE 4.1 may be slower than pmovmskb from SSE2 in some cases. I suggest using simply:

__m128i cmp = _mm_cmpeq_epi32(oldRect, newRect);
if (_mm_movemask_epi8(cmp) != 0xFFFF)
  //registers are different

Note that if you really need that xor value, you'll have to compute it separately.

For Intel processors like Ivy Bridge, the version by PaulR with xor and _mm_testz_si128 translates into 4 uops, while suggested version without computing xor translates into 3 uops (see also this thread). This may result in better throughput of my version.

stgatilov
  • 5,333
  • 31
  • 54
  • Is this extra latency still a problem on Haswell ? I guess I should write some benchmark code and check it... – Paul R Oct 20 '15 at 07:04
  • @PaulR: I don't get which latency you are talking about. I usually look only at throughput, because latency is much harder to analyze and measure =( And here "slower" is also in sense of throughput, not latency. In some cases throughput of vectorized code is limited by frontend, which can process only a limited number of uops per cycle. So in these cases: more uops = slower throughput. – stgatilov Oct 20 '15 at 09:39
  • 1
    @stgatilov: I was referring to the thread you linked to where it says, e.g. "PTEST is decoded into 2 uops and has 3 cycle latency when receives sources from FP instruction", but I guess my question also applies to µops - is the situation the same on Haswell/Skylake ? – Paul R Oct 20 '15 at 10:39
  • The intel thread seems to now be at: https://community.intel.com/t5/Intel-ISA-Extensions/PTEST-improvement/m-p/877818 – William Cushing Jul 12 '23 at 23:26