3

The "regular" registers in x86 are only 32-bit in size, so you can't use them to add two 64-bit integers (unless you do the addition in multiple steps).

But can you add two 64-bit integers natively using another way, using SSE for example?

Tom
  • 1,344
  • 9
  • 27
  • 1
    does a two-instruction sequence count as native? If so, nearly every CPU can add arbitrarily sized integers together "natively". – John Dvorak Apr 14 '19 at 15:37
  • Almost all SSE capable CPUs also support 64-bit extensions (except some Intel models in 2003-2004). What exactly are you after? – Seva Alekseyev Apr 14 '19 at 15:46
  • @SevaAlekseyev SSE goes back the Pentium III in 1999, and Intel had CPUs without 64-bit support in it's product line until at least 2008 (eg. Intel Core Solo/Duo CPUs.) – Ross Ridge Apr 15 '19 at 05:35

1 Answers1

9

In 32 bit modes, there are four ways to do this:

  • the most recommended way is to do an addition in two steps with an add and then an adc on general purpose registers
  • if your CPU has an FPU, you can also use the x87 FPU to do 64 bit arithmetic. Since the x87 FPU holds a 64 bit mantissa, computations on 64 bit integers are exact as long as you don't exceed the 64 bit range.
  • if your CPU supports at least SSE2, you can do 64 bit arithmetic on MMX registers
  • if your CPU supports at least SSE2, you can also do 64 bit arithmetic on XMM registers

The fastest of these for a single 64 bit operation is probably the add/adc approach. For multiple operations, SSE2 is going to be the fastest, then MMX (if you can live with the transition penalty and being unable to use the x87 FPU while in MMX state) and lastly x87.

In 64 bit mode (long mode), you can additionally simply do 64 bit arithmetic on 64 bit general purpose registers.

Let me know if you want more details or examples.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 3
    If you don't need the result in integer registers, MMX or SSE2 `paddd` are very cheap, and reduce register pressure for scalar code. `add/adc` is obviously good if you want to branch on the result or something, otherwise `movq` + `paddd` are great especially on Intel Haswell and earlier (where `adc` is 2 uops). And usually you don't want memory-destination `adc` on modern Intel, so you then need 2 store instructions if the final destination is memory instead of another 64-bit operation. MMX `paddd` can even use a 64-bit memory source for the add (like scalar add/adc) – Peter Cordes Apr 14 '19 at 16:01
  • @PeterCordes How would implement the carry between the 32-bit adds with PADDD? Or do you mean the PADDQ instruction? Also it's not clear that either PADDD or PADDQ are actually MMX instructions. Despite the fact they work with MMX registers, it appears they require SSE2: https://stackoverflow.com/a/13045166/3826372 – Ross Ridge Apr 15 '19 at 05:24
  • @RossRidge: oops, yes I meant to say **`paddq`**, not `paddd`. Too many "32"s flying around in my head, apparently :/ And yes, `paddq mm, mm/m64` was added with SSE2 in Pentium 4, according to [Appendix B of the NASM manual](https://www.nasm.us/doc/nasmdocb.html#section-B.1.7), which lists when all forms of all instructions were introduced. It wouldn't be worth emulating 32-bit carry-out with `paddd` + `pcmpgtd` + shift or something, but on modern CPUs with SSE2, using `paddq mm0, [esp+4]` is a legitimate good choice vs. `movq` + `paddq xmm0, xmm1`, if you can avoid/amortize `emms`. – Peter Cordes Apr 15 '19 at 05:38
  • Downsides: mov-elimination only works on xmm regs, not mmx, and Skylake runs some MMX instructions on fewer ports than the equivalent XMM ones. (Because it's normally obsoleted by 64-bit mode integer registers for integer work, and by SSE* for actual SIMD. And compilers rarely if ever use it for scalar in 32-bit mode.) And EMMS isn't free. – Peter Cordes Apr 15 '19 at 05:42
  • @PeterCordes Fixed! – fuz Apr 15 '19 at 08:07