0

According to https://www.agner.org/optimize/instruction_tables.pdf

They are different in zen4

  • LOCK CMPXCHG, Ops=5, Latency=9

  • LOCK CMPXCHG8B, Ops=15, Latency=10

instruction reference: // can't see any difference

  • 3
    `cmpxchg8b` and `cmpxchg16b` operate on pairs of registers. Plain `cmpxchg` operates on only one register. – fuz Jan 23 '23 at 12:59
  • 2
    Normally you'd only use `cmpxchg8b` in 32-bit mode; 64-bit mode makes `cmpxchg m64` available which is simpler and more efficient. If 64-bit mode didn't exist, perhaps AMD would have added more specialized hardware to speed up merging a pair of registers into a 64-bit value to CAS with, but touching more architectural registers is inherently more expensive since it's more dependency tracking. A single uop can only have a limited number of inputs. – Peter Cordes Jan 23 '23 at 13:23
  • 1
    @PeterCordes `cmpxchg8b` might still be useful e.g. in ABIs with 32 bit pointers. Saves you the hassle of joining and splitting the two parts. – fuz Jan 23 '23 at 13:25

0 Answers0