Why does int addition though pointers take one less x86 instruction than int multiplication through pointers?

Question

I have the following C/C++ code (compiler explorer link):

void update_mul(int *x, int *amount) { 
    *x *= *amount; 
}

void update_add(int *x, int *amount) { 
    *x += *amount; 
}

Under both clang and gcc compiling as C or as C++ with at least -O1 enabled, the above translates to this assembly:

update_mul:                             # @update_mul
        mov     eax, dword ptr [rdi]
        imul    eax, dword ptr [rsi]
        mov     dword ptr [rdi], eax
        ret
update_add:                             # @update_add
        mov     eax, dword ptr [rsi]
        add     dword ptr [rdi], eax
        ret

It seems like for add it's doing something like:

register = *amount;
*x += register;

But for multiply it's doing:

register = *x;
register *= *amount;
*x = register;

Why does the multiplication require an extra instruction over the add, or is it not required but just faster?

fwiw, you don't need pointers to see the extra `mov` : https://godbolt.org/z/YTfTKe75o — 463035818_is_not_an_ai, Aug 11 '21 at 15:55
Note also that since instructions can be executed in parallel counting instruction (or cycles per instruction) is not good metric of performance. So it is possible speed of both functions could be indistinguishable. In this simple case it should be fine. — Marek R, Aug 11 '21 at 16:55

Botje · Accepted Answer · 2021-08-11T16:00:08.687

The IA-32 architecture specification (alternative single-page link) shows that there is simply no encoding for IMUL where the destination (first argument) is a memory operand:

Encoding               | Meaning
IMUL r/m8*             | AX ← AL ∗ r/m byte.
IMUL r/m16             | DX:AX ← AX ∗ r/m word.
IMUL r/m32             | EDX:EAX ← EAX ∗ r/m32.
IMUL r/m64             | RDX:RAX ← RAX ∗ r/m64.
IMUL r16, r/m16        | word register ← word register ∗ r/m16.
IMUL r32, r/m32        | doubleword register ← doubleword register ∗ r/m32.
IMUL r64, r/m64        | Quadword register ← Quadword register ∗ r/m64.
IMUL r16, r/m16, imm8  | word register ← r/m16 ∗ sign-extended immediate byte.
IMUL r32, r/m32, imm8  | doubleword register ← r/m32 ∗ sign- extended immediate byte.
IMUL r64, r/m64, imm8  | Quadword register ← r/m64 ∗ sign-extended immediate byte.
IMUL r16, r/m16, imm16 | word register ← r/m16 ∗ immediate word.
IMUL r32, r/m32, imm32 | doubleword register ← r/m32 ∗ immediate doubleword.
IMUL r64, r/m64, imm32 | Quadword register ← r/m64 ∗ immediate doubleword.

Historical reason: multi-operand forms of `imul` were new with 186 (immediate) and 386 (r, r/m). Unlike with `add`, one of the ALU instructions from original 8086 thus having opcodes for both the `r, r/m` and `r/m, r` forms. Unlike some other limitations / design choices, not having memory-destination multiply is not noticeable a problem for x86. In real life you'd always want to inline tiny functions like these anyway, and often at least one operand will already be in a register. — Peter Cordes, Aug 11 '21 at 16:22

Why does int addition though pointers take one less x86 instruction than int multiplication through pointers?

1 Answers1