When compiled for x64, the following function uses the XMM0 register for parameter passing:
void foo (double const scalar)
{
__m256d vector = _mm256_broadcast_sd(&scalar);
}
In assembly, the vbroadcastsd
opcode can take a register operand. The equivalent intrinsic appears to only accept a pointer to a memory operand. Is there a way to guarantee that compilers will optimise loads like this to avoid a store to memory?