2

Is there an Intel SSE instruction which can load floats from (non contiguous) evenly spaced memory addresses?

For example given an array A = {0, 1, 2, 3 .... n}, I would like to load into a 128 bit register at once {A[0], A[4], A[8], A[12]}, followed by {A[5], A[9], A[13], A[17]}

Paul R
  • 208,748
  • 37
  • 389
  • 560
jaynp
  • 3,275
  • 4
  • 30
  • 43
  • 1
    What are you trying to do? There may be several ways your can rewrite your algorithm to avoid gathering from non contiguous memory. –  Apr 23 '13 at 08:36

1 Answers1

3

In this kind of use case you would typically load multiple contiguous vectors and then permute them into the required arrangements using e.g. pshufd or punpckldq etc.

Note that with AVX2 in Haswell and beyond there are gathered load instructions (e.g. _mm_i32gather_ps), which might also be worth considering.

Paul R
  • 208,748
  • 37
  • 389
  • 560