0

In the code I am debugging, there's an assembly instruction as shown below:

pmuludq xmm6, xmm1

xmm6 = 0x3736353433323130
xmm1 = 0x7D35343332313938

If I multiply the above 2 numbers using Python, I get the result as shown below:

>>> hex(0x3736353433323130 * 0x7D35343332313938)
'0x1b00f1758e3c83508a9f69982a1e7280L'

However, when I am debugging the code, the value of xmm6 register after the multiply operation is: 0x0A09A5A82A1E7280

Why is the result different? And how can I simulate this instruction using Python?

Neon Flash
  • 3,113
  • 12
  • 58
  • 96
  • Python uses arbitrary precision integers, that is, there's never any overflow. You'd need to handle the overflow case yourself. – Collin Dec 03 '18 at 02:32

1 Answers1

2

look at the Operation section in the manual for pseudocode: http://felixcloutier.com/x86/PMULUDQ.html.

It does two 32x32 => 64 (dword x dword => qword) multiplies, one in each half of the 16-byte register. (It ignores the odd dword elements of the inputs). You only showed 16 hex digits for the inputs, so I think you're only looking at the low qword of the input registers.

If you only care about the low 64 bits, then the equivalent operation is simply

result = (a & 0xFFFFFFFF) * (b & 0xFFFFFFFF)

It repeats the same thing for the high 64 bits.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Do you mean that I showed only 16 hex digits for xmm6 and xmm1 registers? The binary is using movq operation to move a QWORD from a memory address into xmm1 and xmm6 registers. I think, that's why the lower QWORD for XMM registers is 0. – Neon Flash Dec 03 '18 at 02:40
  • @NeonFlash: yes, edited to clarify. And yes, using `movq` does zero-extend, filling the high half of the XMM register with zero. (And `pmuludq` doesn't change that.) Usually not much point using `pmuludq` for scalar operations when you can do scalar `imul`, like `mov ecx, [rdi]` / `mov eax, [rsi]` / `imul rcx, rax`. – Peter Cordes Dec 03 '18 at 02:52