I am writing a function library to provide all conventional operators and functions for signed-integer types s0128
, s0256
, s0512
, s1024
and floating-point types f0128
, f0256
, f0512
, f1024
.
I am writing the s0128
, s0256
, s0512
, s1024
multiply routines now, but am getting erroneous results that confuse me. I assumed I could cascade multiplies with the 64-bit imul rcx
instruction (that produces a 128-bit result in rdx:rax
) in the same way I could do the same with unsigned operands with the mul rcx
instruction... but the answers with imul
are wrong.
I suspect there is some trick to make this work, maybe mix imul
and mul
instructions - or something. Or is there some reason one cannot implement larger multiplies with signed multiply instructions?
So you understand the technique, I'll describe the smallest version, for s0128
operands.
arg2.1 arg2.0 : two 64-bit parts of s0128 operand
arg1.1 arg1.0 : two 64-bit parts of s0128 operand
---------------
0 out.edx out.eax : output of arg1.0 * arg2.0
out.edx out.eax : output of arg1.0 * arg2.1
-------------------------
out.2 out.1 out.0 : sum the above intermediate results
out.edx out.eax : output of arg1.1 * arg2.0
-------------------------
out.2 out.1 out.0 : sum the above intermediate results
Each time the code multiplies two 64-bit values, it generates a 128-bit result in edx:eax
. Each time the code generates a 128-bit result, it sums that result into an accumulating triple of 64-bit registers with addq
, adcq
, adcq
instructions (where the final adcq
instruction only adds zero to assure any carry flags gets propagated).
When I multiply small negative numbers by small positive numbers as a test, the result is negative, but there are one or two non-zero bits at the bottom of the upper 64-bit value in the 128-bit s0128
result. This implies to me that something isn't quite right with propagation in multiprecision signed multiplies.
Of course the cascade is quite a bit more extensive for s0256
, s0512
, s1024
.
What am I missing? Must I convert both operands to unsigned, perform unsigned multiply, then negate the result if one (but not both) of the operands was negative? Or can I compute multiprecision results with the imul
signed multiply instruction?