1

Within the intel intrinsics guide, the pseudocode for the operation of _mm_insert_ps, the following is defined:

FOR j := 0 to 3
    i := j*32
    IF imm8[j%8]
        dst[i+31:i] := 0
    ELSE
        dst[i+31:i] := tmp2[i+31:i]
    FI
ENDFOR

. The access into imm8 confuses me: IF imm8[j%8]. As j is within the range 0..3, the modulo 8 part doesn't seem to do anything. Does this maybe signal a convertion that I am not aware of? Or is % not "modulo" in this case?

Brotcrunsher
  • 1,964
  • 10
  • 32
  • 2
    I think it's just a coding convention that they copy-paste everywhere. Somewhere between safe for modification and cargo-cult programming. – JHBonarius Jan 28 '22 at 11:37

1 Answers1

1

Seems like a pointless modulo.

Intel's documentation for the corresponding asm instruction, insertps, doesn't use any % modulo operations in the pseudocode. It uses ZMASK ←imm8[3:0] and then basically unrolls that part of the pseudocode where this uses a loop, with checks like

IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
    ELSE DEST[95:64]←TMP2[95:64]

This is just showing how the low 4 bits of the immediate perform zero-masking on the 4 dword elements of the final result, after the insert of an element from another vector, or a scalar in memory.

(There's no intrinsic for insert directly from memory; you'd need an intrinsic for movss and then hope the compiler folds that load into a memory operand for insertps. With a memory source, imm8[7:6] are ignored, just taking that scalar dword as the element to insert (that's the ELSE COUNT_S←0 in the asm pseudocode), but then everything else works the same, including the zero-masking you're asking about.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847