1

I'm trying my hand at assembly in order to use vector operations, which I've never really used before, and I'm admittedly having a bit of trouble grasping some of the syntax.

The relevant code is below.

unit16_t asdf[4];
asdf[0] = 1;
asdf[1] = 2;
asdf[2] = 3;
asdf[3] = 4;
uint16_t other = 3;

__asm__("movq %0, %%mm0"
        :
        : "m" (asdf));
__asm__("pcmpeqw %0, %%mm0"
        :
        : "r" (other));
__asm__("movq %%mm0, %0" : "=m" (asdf));

printf("%u %u %u %u\n", asdf[0], asdf[1], asdf[2], asdf[3]);

In this simple example, I'm trying to do a 16-bit compare of "3" to each element in the array. I would hope that the output would be "0 0 65535 0". But it won't even assemble.

The first assembly instruction gives me the following error:

error: memory input 0 is not directly addressable

The second instruction gives me a different error:

Error: suffix or operands invalid for `pcmpeqw'

Any help would be appreciated.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1274193
  • 1,104
  • 1
  • 10
  • 11

4 Answers4

4

You can't use registers directly in gcc asm statements and expect them to match up with anything in other asm statements -- the optimizer moves things around. Instead, you need to declare variables of the appropriate type and use constraints to force those variables into the right kind of register for the instruction(s) you are using.

The relevant constraints for MMX/SSE are x for xmm registers and y for mmx registers. For your example, you can do:

#include <stdint.h>
#include <stdio.h>

typedef union xmmreg {
    uint8_t   b[16];
    uint16_t  w[8];
    uint32_t  d[4];
    uint64_t  q[2];
} xmmreg;

int main() {
    xmmreg v1, v2;
    v1.w[0] = 1;
    v1.w[1] = 2;
    v1.w[2] = 3;
    v1.w[3] = 4;
    v2.w[0] = v2.w[1] = v2.w[2] = v2.w[3] = 3;
    asm("pcmpeqw %1,%0" : "+x"(v1) : "x"(v2));
    printf("%u %u %u %u\n", v1.w[0], v1.w[1], v1.w[2], v1.w[3]);
}

Note that you need to explicitly replicate the 3 across all the relevant elements of the second vector.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • 2
    Also worth mentioning that this would be much better done with intrinsics like `__m128i pattern = _mm_set_epi16(3);` and `_mm_loadu_si128( (const __m128i*)asdf );`. But yes, this addresses the fatal flaws in using inline asm. https://stackoverflow.com/tags/inline-assembly/info – Peter Cordes Oct 06 '21 at 20:56
3

From intel reference manual:

PCMPEQW mm, mm/m64        Compare packed words in mm/m64 and mm for equality.
PCMPEQW xmm1, xmm2/m128   Compare packed words in xmm2/m128 and xmm1 for equality.

Your pcmpeqw uses an "r" register which is wrong. Only "mm" and "m64" registers

valter

0

The code above failed when expanding the asm(), it never tried to even assemble anything. In this case, you are trying to use the zeroth argument (%0), but you didn't give any.

Check out the GCC Inline assembler HOWTO, or read the relevant chapter of your local GCC documentation.

vonbrand
  • 11,412
  • 8
  • 32
  • 52
  • All the asm statements in the buggy code in the question contain one operand, either an input or an output. That's not what's wrong with it, it's that the constraints are wrong so it expands to something like `pcmpeqw %eax, %mm0`. (And other bugs like assuming mm0 continuity between asm statements.) – Peter Cordes Oct 06 '21 at 20:53
0

He's right, the optimizer is changing register contents. Switching to intrinsics and using volatile to keep things a little more in place might help.

court
  • 53
  • 10