3

I need to broadcast one arbitrary element of __m128 vector. For example the second element:

__m128 a = {a0, a1, a2, a3};
__m128 b = {a1, a1, a1, a1};

I know that there are intrinsics _mm_set1_ps(float) and _mm_broadcast_ss(float*). But these intrinsics can load value from common use registers of memory. Is exist any way to set scalar value from another vector register?

Alex
  • 65
  • 5

2 Answers2

5

You can just use _mm_shuffle_ps like this:

b = _mm_shuffle_ps(a, a, _MM_SHUFFLE(1,1,1,1));
Ap31
  • 3,244
  • 1
  • 18
  • 25
  • Thank you. Your solution gets the same results. I don't now which is the best. – Alex Apr 17 '17 at 12:21
  • 2
    @Alex usually this one, whenever there is a difference. Using wrongly typed instructions can cost an extra bypass delay, depending on the context and the µarch. – harold Apr 17 '17 at 14:30
  • The disadvantage of this instruction is that it may require an additional register move (which however, may or may not have any impact at all) – chtz Apr 18 '17 at 13:48
  • @chtz I agree, somehow there is no `__m128` intrinsic for `pshufd` – Ap31 Apr 18 '17 at 17:26
  • okay after [some research](http://stackoverflow.com/questions/43495363/why-is-there-no-floating-point-intrinsic-for-pshufd-instruction/) turns out this answer is the way to go after all. Switching between float and integer pretty much flushes the whole pipeline – Ap31 Apr 20 '17 at 18:41
4

I think you have to see to _mm_shuffle_epi32(). Its using will be easy with next helper function:

#include <emmintrin.h>

template <int index> inline __m128 Broadcast(const __m128 & a)
{
    return _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(a), index * 0x55));
}

int main()
{
    __m128 a = {a0, a1, a2, a3};
    __m128 b = Broadcast<1>(a);
    return 0;
}
ErmIg
  • 3,980
  • 1
  • 27
  • 40