Errors using inline assembly in C

Question

I'm trying my hand at assembly in order to use vector operations, which I've never really used before, and I'm admittedly having a bit of trouble grasping some of the syntax.

The relevant code is below.

unit16_t asdf[4];
asdf[0] = 1;
asdf[1] = 2;
asdf[2] = 3;
asdf[3] = 4;
uint16_t other = 3;

__asm__("movq %0, %%mm0"
        :
        : "m" (asdf));
__asm__("pcmpeqw %0, %%mm0"
        :
        : "r" (other));
__asm__("movq %%mm0, %0" : "=m" (asdf));

printf("%u %u %u %u\n", asdf[0], asdf[1], asdf[2], asdf[3]);

In this simple example, I'm trying to do a 16-bit compare of "3" to each element in the array. I would hope that the output would be "0 0 65535 0". But it won't even assemble.

The first assembly instruction gives me the following error:

error: memory input 0 is not directly addressable

The second instruction gives me a different error:

Error: suffix or operands invalid for `pcmpeqw'

Any help would be appreciated.

Fixed, although I don't think the distinction matters in this example. — user1274193, Feb 24 '14 at 18:36
It's available under a bunch of names, such as `_m_pcmpeqw`, `_mm_cmpeq_pi16` or `__builtin_ia32_pcmpeqw`. Also, when using vector extensions, you can simply use the `==` operator. See the gcc documentation. — Jester, Feb 24 '14 at 19:13

Chris Dodd · Answer 1 · 2021-10-06T20:33:31.573

You can't use registers directly in gcc asm statements and expect them to match up with anything in other asm statements -- the optimizer moves things around. Instead, you need to declare variables of the appropriate type and use constraints to force those variables into the right kind of register for the instruction(s) you are using.

The relevant constraints for MMX/SSE are x for xmm registers and y for mmx registers. For your example, you can do:

#include <stdint.h>
#include <stdio.h>

typedef union xmmreg {
    uint8_t   b[16];
    uint16_t  w[8];
    uint32_t  d[4];
    uint64_t  q[2];
} xmmreg;

int main() {
    xmmreg v1, v2;
    v1.w[0] = 1;
    v1.w[1] = 2;
    v1.w[2] = 3;
    v1.w[3] = 4;
    v2.w[0] = v2.w[1] = v2.w[2] = v2.w[3] = 3;
    asm("pcmpeqw %1,%0" : "+x"(v1) : "x"(v2));
    printf("%u %u %u %u\n", v1.w[0], v1.w[1], v1.w[2], v1.w[3]);
}

Note that you need to explicitly replicate the 3 across all the relevant elements of the second vector.

Also worth mentioning that this would be much better done with intrinsics like `__m128i pattern = _mm_set_epi16(3);` and `_mm_loadu_si128( (const __m128i*)asdf );`. But yes, this addresses the fatal flaws in using inline asm. https://stackoverflow.com/tags/inline-assembly/info — Peter Cordes, Oct 06 '21 at 20:56

γηράσκω δ' αεί πολλά διδασκόμε · Answer 2 · 2014-02-24T19:09:13.290

3

From intel reference manual:

PCMPEQW mm, mm/m64        Compare packed words in mm/m64 and mm for equality.
PCMPEQW xmm1, xmm2/m128   Compare packed words in xmm2/m128 and xmm1 for equality.

Your pcmpeqw uses an "r" register which is wrong. Only "mm" and "m64" registers

valter

edited Feb 24 '14 at 19:09

answered Feb 24 '14 at 19:03

γηράσκω δ' αεί πολλά διδασκόμε

7,172
2
21
35

score 0 · Answer 3 · answered Feb 24 '14 at 20:00

0

The code above failed when expanding the asm(), it never tried to even assemble anything. In this case, you are trying to use the zeroth argument (%0), but you didn't give any.

Check out the GCC Inline assembler HOWTO, or read the relevant chapter of your local GCC documentation.

answered Feb 24 '14 at 20:00

vonbrand

11,412
8
32
52

All the asm statements in the buggy code in the question contain one operand, either an input or an output. That's not what's wrong with it, it's that the constraints are wrong so it expands to something like `pcmpeqw %eax, %mm0`. (And other bugs like assuming mm0 continuity between asm statements.) – Peter Cordes Oct 06 '21 at 20:53

court · Answer 4 · 2014-04-24T09:00:48.180

0

He's right, the optimizer is changing register contents. Switching to intrinsics and using volatile to keep things a little more in place might help.

edited Apr 24 '14 at 09:00

answered Apr 24 '14 at 08:54

court

53
10

Errors using inline assembly in C

4 Answers4