When to use CMP & TEQ instructions in ARM Assembly?

Question

why two separate instructions instead of one instruction? Practically in what kind of situations we need to use CMP and TEQ instructions.

I know how both the instruction works.

From the usage notes in ARM DDI 0100E: _"`TEQ` is used to test if two values are equal, without affecting the V flag (as `CMP` does). The C flag is also unaffected in many cases. `TEQ` is also useful for testing whether the two values have the same sign. After the comparison, the N flag is the logical Exclusive OR of the sign bits of the two operands."_ — Michael, Sep 04 '19 at 12:26
@Michael, I assume the OP is wondering why ARM did implement a TEQ instruction whereas the CMP instruction can be also used to compare two values (even if the flags setting is different). — Guillaume Petitjean, Sep 04 '19 at 14:04
@GuillaumePetitjean I was replying to the _"Practically in what kind of situations we need to use CMP and TEQ instructions."_-part (though it's more "can" than "need to"). — Michael, Sep 04 '19 at 14:09
@Michael. Gotcha. I did a quick test on godbolt with `arm-none-eabi-gcc` and the generated instruction for both `if ( a != b)`and `if(a > b)` always consists of a `CMP` instruction. Indeed it's not easy to understand the need to have 2 separate instructions — Guillaume Petitjean, Sep 04 '19 at 14:13
@GuillaumePetitjean Well, they don't do the same thing. `CMP` sets the flag based on `op1 - op2`, while `TEQ` sets the flags based on `op1 XOR op2`. So `CMP` can check for the ordering of two values (==, >, <, etc). `TEQ` on the other hand can check for equality and whether the signs are the same. — Michael, Sep 04 '19 at 14:22
Sure I understand this. But you can do what `TEQ`do with a `CMP`, right ? Perhaps `TEQ`is faster on some MCUs ? Or is there another reason to have both instructions ? — Guillaume Petitjean, Sep 04 '19 at 14:25
How would you check if the two operands have the same sign with a single `CMP`? — Michael, Sep 04 '19 at 14:26
https://godbolt.org/z/07WwSI Would return 0 if signs differ or the product if it would be non-zero. Of course you could do the multiple and then just clamp to zero. However, the MUL is actually expensive on some CPUs so this would be faster. It is also possible to do multiple conditions at once; gcc doesn't seem to even try this, but an assembler programmer can. Ie, you can set 'N' and 'C' for different operands but test for both. Also, it might be useful in combination with `subs`, etc which does destructive testing as opposed to `cmp`. — artless noise, Sep 04 '19 at 16:12

score 7 · Answer 1 · answered Sep 04 '19 at 16:09

short: Both serve different purposes each, cmp is subs without a destination while teq is eors without a destination.

cmp is very straightforward: you compare two numbers A and B
signed:
gt: A > B
ge: A >= B
eq: A == B
le: A <= B
lt: A < B

unsigned:
hi: A > B
hs: A >= B
eq: A == B
ls: A <= B
lo: A < B

Let's assume the problem below though:

int32_t foo(int32_t A)
{
    if (((A < 0) && ((A & 1) == 1)) || ((A >= 0) && ((A & 1) == 0)))
    {
        A += 1;
    }
    else
    {
        A -= 1;
    }

    return A;
}

In human language, the if statement is true if A is either an (odd negative number) or an (even positive number), and Linaro GCC 7.4.1 @ O3 will generate that mess below:

foo
        0x00000000:    CMP      r0,#0
        0x00000004:    AND      r3,r0,#1
        0x00000008:    BLT      {pc}+0x14 ; 0x1c
        0x0000000C:    CMP      r3,#0
        0x00000010:    BEQ      {pc}+0x14 ; 0x24
        0x00000014:    SUB      r0,r0,#1
        0x00000018:    BX       lr
        0x0000001C:    CMP      r3,#0
        0x00000020:    BEQ      {pc}-0xc ; 0x14
        0x00000024:    ADD      r0,r0,#1
        0x00000028:    BX       lr

People knowledgeable in the field of bit hacking would alter the if statement like below:

int32_t bar(int32_t A)
{
    if ((A ^ (A<<31)) >= 0)
    {
        A += 1;
    }
    else
    {
        A -= 1;
    }

    return A;
}

And the results are:

bar
        0x0000002C:    EORS     r3,r0,r0,LSL #31
        0x00000030:    ADDPL    r0,r0,#1
        0x00000034:    SUBMI    r0,r0,#1
        0x00000038:    BX       lr

And finally, assembly programmers will replace EORS with teq r0, r0, lsl #31.

It won't make the code any faster, but it doesn't need R3 as the scratch register.

Note that the code above is just a show case, being a separate function where you have excess of available registers.

In real life however, registers are by far the most scarce resource, especially inside a loop, and even compilers will make use of the teq instruction in similar situations.

Summing it up, there are fields such as error correction, decryption/encryption, etc where tons of xor operations are done, and people dealing with those problems just know to appreciate instructions such as teq and when to us them.

And always remember: never trust compilers

I think the important point is that the surrounding instructions as opposed to isolation show the benefit of `TEQ`. — artless noise, Sep 04 '19 at 16:16
@artlessnoise You have a very good point, but it's kinda hard to explain by short examples. My focus was on the nature of the `xor` operations since I had the feeling that the OP knows them only from textbooks. — Jake 'Alquimista' LEE, Sep 04 '19 at 16:25
That is a great point, so it is also [worth mentioning `tst`](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204f/Cihcdehh.html). So you have an equivalent of `subs`, `ands` and `eors` with `cmp`, `tst`, and `teq`. `ldr` is sort of similar with `pld`. All are instructions that don't update registers, but do get the side effects of what the other operation would do. But basically if you don't put restriction on register use, they don't make much sense (which goes back to my first point). — artless noise, Sep 05 '19 at 14:10

When to use CMP & TEQ instructions in ARM Assembly?

1 Answers1