0

When compiled for x64, the following function uses the XMM0 register for parameter passing:

void foo (double const scalar)
{
    __m256d vector = _mm256_broadcast_sd(&scalar);
}

In assembly, the vbroadcastsd opcode can take a register operand. The equivalent intrinsic appears to only accept a pointer to a memory operand. Is there a way to guarantee that compilers will optimise loads like this to avoid a store to memory?

linguamachina
  • 5,785
  • 1
  • 22
  • 22

2 Answers2

3

I wouldn't think anyone can GUARANTEE it, but assuming you enable at least some optimisation, I'd be very disappointed if any modern compiler didn't remove unnecessary pointer indirections... I have certainly seen more intricate problems that the compiler has figured out how to simplify.

I take it you haven't looked at the generated code to determine what it does (because your question would have been phrased differently).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Thanks for the input... I realise there may not be such a thing as 'guarantee', but I notices that VS2010 emits `vmovsd` to copy the double to the stack, before `vbroadcastsd` with a memory operand... I haven't tried ICC yet, but the issue made me wonder about the wider issue of getting plain old doubles and floats into SSE registers. – linguamachina Sep 10 '13 at 12:15
  • g++-4.8 does optimize away the store to memory. Interestingly enough, icc-13 doesn't, regardless of the optimization level. – us2012 Sep 10 '13 at 12:20
  • 2
    @PetrBudnik [tinyurl'ed link to GCC godbolt worksheet](http://tinyurl.com/oogjcxp). Change compilers there to see different ones. – us2012 Sep 10 '13 at 12:24
  • 1
    @headeronly: I'm afraid that's a problem of inlining algorithm in MSVC. Most likely you can remove this memory load if you specify your function as `__vectorcall` (only MSVC2013 and later). If it quite funny that calling convention affects the code even when the function call is inlined =) – stgatilov Sep 16 '15 at 15:50
0

If you're worried about parameter passing on the stack, then you're function is likely too short or too important to end up being called as a separate function. Use

__forceinline

with visual C++ or

__attribute__((always_inline)) 

with g++.