Subtract content of vector from scalar

Asked Sep 06 '14 at 13:49

Active Sep 06 '14 at 13:51

Viewed 194 times

I try to optimize by code for different SIMD architectures. What is best way to calculate the following:

For SSE:

float  s = something  
__m128 v = calculation result  

s -= v[0] + v[1] + v[2] + v[3]

At the moment I calculate the horizontal sum by:

__m128 sum = _mm_hadd_ps( v, v )  
       sum = _mm_hadd_ps( sum, sum )  

s -= _mm_cvtss_f32( sum )

Is there some cool optimization possible ?

edited Sep 06 '14 at 13:51

Bill Lynch

asked Sep 06 '14 at 13:49

Maik

1

That's 3 instructions - it's hard to see how you'd beat it. – Paul R Sep 07 '14 at 07:41
1

As Paul said, there is no room for optimization since you only have 4 instructions. Do you have that embedded in a loop, or is that a pattern of a code that you have to vectorize? If so, could you perhaps show the whole code? – a3mlord Sep 08 '14 at 20:16
1

Like @a3mlord said, if that's inside a loop, only do the horizontal op at the end of the loop. Horizontal ops are significantly slower than in-lane ops. – Peter Cordes Jun 09 '15 at 02:47

0 Answers0