1

Is there a way to realign data that has been loaded into SSE/AVX vector registers (say to implement a sliding window)? Or do I need to shift the bytes myself and reload into vector registers from memory again?

JRR
  • 6,014
  • 6
  • 39
  • 59
  • 1
    Are you wanting to align them on a bitwise, byte-wise, or word-wise basis? – Dai Nov 07 '20 at 07:20
  • 1
    For 128-bit vectors, SSSE3 / AVX `palignr` works. For AVX2, the 2x 128-bit lane behaviour is nearly useless for this. Sometimes reloading from memory is better, though: 2/clock load throughput with no penalty if you don't cross a cache-line boundary (on Intel). – Peter Cordes Nov 07 '20 at 07:20
  • @dai either bit or bytewise – JRR Nov 07 '20 at 07:21

1 Answers1

3

For 128-bit vectors, SSSE3 / AVX [v]palignr xmm works for arbitrary byte-windows on a pair of registers. For AVX2 ymm registers, the 2x 128-bit lane behaviour is nearly useless for this. _mm_alignr_epi8 (PALIGNR) equivalent in AVX2

Sometimes reloading from memory is better, though: 2/clock load throughput with no penalty if you don't cross a cache-line boundary (on Intel) vs. 1/clock shuffle throughput. And the throughput / latency penalty for cache-line splits isn't terrible. If one palignr is sufficient, usually use it, but it's usually better to do unaligned loads instead of trying to emulate it for AVX2.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847