2

I'm working on Core i7 on Linux and using g++ 4.63.

I tried the following code:

#include <iostream>
#include <immintrin.h>

int main() {
__m256d a = _mm256_set_pd(1,2,3,4);
__m256d z = _mm256_setzero_pd();
std::cout << _mm256_testz_pd(a,a) << std::endl;
std::cout << _mm256_testz_pd(z,z) << std::endl;
std::cout << _mm256_testz_pd(a,z) << std::endl;
}

It printed 3 1's. I was expecting at least one of them to be 0.

I tried using _mm256_castpd_si256 and then _mm256_testz_si256, it'll print 0 for the first line.

Why?

Paul R
  • 208,748
  • 37
  • 389
  • 560
Ming
  • 365
  • 2
  • 12

1 Answers1

4

Whereas _mm256_testz_si256 (VPTEST) operates on all bits in the source vectors, _mm256_testz_pd (VTESTPD) only operates on the sign bit of each double precision element. In your test all the sign bits in both vectors are zero, so you're getting the correct result.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Wait a minute. If I'm reading your answer correctly, [this](http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_testz_pd.htm) says otherwise. It looks like it's bitwise ANDing the two operands. Then for each element, it returns true if the result is zero. – Mysticial May 21 '13 at 21:54
  • 1
    I suspect that page needs correcting - the latest version of the Intel Intrinsics Guide (2.8.1) says: *Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise AND NOT of a and b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.* – Paul R May 21 '13 at 21:56
  • Wow, even [this version](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-mac/GUID-8EFAEC85-AC33-412C-BD09-68A740AD5764.htm) of the docs say the same thing. – Mysticial May 21 '13 at 21:59
  • 1
    I expect it's a copy-paste error (copying from `_mm256_testz_si256`) that's been propagated throughout the documentation - at least it's fixed in the Intrinsics Guide, which is what I tend to use for reference. – Paul R May 21 '13 at 22:02
  • No problem - it seems that some versions of the documentation are incorrect so you may not even have misread it. – Paul R May 22 '13 at 21:45