0

punpcklqdq is defined as :

PUNPCKLQDQ xmm1, xmm2/m128
__m128i _mm_unpacklo_epi64(__m128i a, __m128i b)
enter image description here

Description Unpack and interleave 64-bit integers from the low half of a and b, and store the results in dst.

Here the useful data are low half of a and b, which is __m128i itself, and this command support the second operation parameter as accessed by memory address directly, like: punpcklqdq xmm0, [r0+2*r1].

Here come one question, if the reserved memory has the boundary of low half of b, but not high half of b, it will trigger valgrind error, like: Invalid read of size 16.

My question is: what kind of issue could be caused further by this invalid read (we do not use it after read)? Does it need fix?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • It faults if it extends into an unmapped page, but that's only possible with AVX. (The SSE2 version requires an aligned operand). The asm manual allows it to only read the low 64 bits of the memory operand, but does require it to still enforce fault checking. I haven't tested to see if it causes a store-forwarding stall if the 16-byte memory operand would include a recent narrow store but the low 8 bytes don't. `movhps` is a good replacement, using only an 8-byte memory operand. (Unfortunately no replacement available for smaller elements like `punpcklbw` byte interleave.) – Peter Cordes Jun 10 '22 at 20:05
  • Related: [Invalid instruction operand when using punpcklwd with MMWORD PTR 64-bit memory operand](https://stackoverflow.com/q/72418266) re: the (poor) design choice of having a 128-bit memory operand. It could have been different from `punpckhqdq xmm, mem`. – Peter Cordes Jun 10 '22 at 20:06
  • thanks, we'd better to use movhps to replace it where the unmapped place to avoid potential issue. – wowengineer Jun 13 '22 at 04:01

0 Answers0