Is there an Intel SSE instruction which can load floats from (non contiguous) evenly spaced memory addresses?
For example given an array A = {0, 1, 2, 3 .... n}
, I would like to load into a 128 bit register at once {A[0], A[4], A[8], A[12]}
, followed by
{A[5], A[9], A[13], A[17]}