7

I am trying to understand how the following code snippet works. This program uses SIMD vector instructions (Intel SSE) to calculate the absolute value of 4 floats (so, basically, a vectorized "fabs()" function).

Here is the snippet:

#include <iostream>
#include "xmmintrin.h"

template <typename T>
struct alignas(16) sse_t
{
    T data[16/sizeof(T)];
};

int main()
{
    sse_t<float> x;
    x.data[0] = -4.;
    x.data[1] = -20.;
    x.data[2] = 15.;
    x.data[3] = -143.;
    __m128 a = _mm_set_ps1(-0.0); // ???
    __m128 xv = _mm_load_ps(x.data);
    xv = _mm_andnot_ps(a,xv); // <-- Computes absolute value
    sse_t<float> result;
    _mm_store_ps(result.data, xv);
    std::cout << "x[0]: " << result.data[0] << std::endl;
    std::cout << "x[1]: " << result.data[1] << std::endl;
    std::cout << "x[2]: " << result.data[2] << std::endl;
    std::cout << "x[3]: " << result.data[3] << std::endl;
}

Now, I know it works, since I ran the program myself to test it. When compiled with g++ 4.8.2, the result is:

x[0]: 4
x[1]: 20
x[2]: 15
x[3]: 143

Three (related) questions puzzle me:

First, how is it even possible to take a bitwise function and apply it on a float? If I try this in vanilla C++, it informs me that this only works for integral types (which makes sense).

But, second, and more importantly: How does it even work? How does taking a NOT and an AND even help you here? Trying this in Python with an integral type just gives you the expected result: any integral number AND -1 (which is NOT 0), simply gives you that number back, but doesn't change the sign. So how does it work here?

Third, I noticed that if I change the value of the float used for the NAND operation (marked with three ???), from -0.0 to 0.0, the program doesn't give me the absolute value anymore. But how can a -0.0 even exist and how does it help?

Helpful references:

Intel intrinsics guide

Mark Anderson
  • 2,399
  • 3
  • 15
  • 21
  • This is SSE, values don't have types - the operations you do determines how the bit-pattern is interpreted. This is just ANDing out the signbit. – harold May 24 '14 at 16:44
  • I'm not sure if my tag edit was the best possible. The point is that the question only makes sense with IEEE 754 floating point representation. Which is implied by use of Visual C++. But even the use of a particular programming *language* is mostly irrelevant. Maybe someone with a better feel for tags can improve. – Cheers and hth. - Alf May 24 '14 at 16:49
  • @Alf I did not use Visual C++, otherwise, why would I use the gcc compiler? I didn't even use windows for this, I used Linux. So, it definitely works there, too. – Mark Anderson May 24 '14 at 17:36

1 Answers1

10

-0.0 is represented as 1000...0001. Therefore _mm_andnot_ps(-0.0, x)2 is equivalent to 0111...111 & x. This forces the MSB (which is the sign bit) to 0.


1. In IEEE-754, at least.

2. The _mm_andnot_ps intrinsic does not mean "NAND"; see e.g. http://msdn.microsoft.com/en-us/library/68h7wd02(v=vs.90).aspx.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680