Ran into this yesterday, I will try to give clear and simple examples which fail for me with MSVC12 (VS2013, 120) and MSVC14 (VS2015, 140). Everything is implicitly /arch:SSE+ with x64.
I will trivialize the issue to a simple matrix transpose example using defined macros _MM_TRANSPOSE4_PS for illustration purposes. This one is implemented in terms of shuffles, rather than moving L/H 8 byte blocks around.
float4x4 Transpose(const float4x4& m) {
matrix4x4 n = LoadMatrix(m);
_MM_TRANSPOSE4_PS(n.row[0], n.row[1], n.row[2], n.row[3]);
return StoreMatrix(n);
}
The matrix4x4
is merely a POD struct containing four __m128
members, everything is tidily aligned on a 16-byte boundary, even though it is somewhat implicit:
__declspec(align(16)) struct matrix4x4 {
__m128 row[4];
};
All of this fails on /O1, /O2 and /Ox:
// Doesn't work.
float4x4 resultsPlx = Transpose( GiveMeATemporary() );
// Changing Transpose to take float4x4, or copy a temporary
float4x4 Transpose(float4x4 m) { ... }
// Trying again, doesn't work.
float4x4 resultsPlx = Transpose( GiveMeATemporary() );
Curiously enough, this works:
// A constant reference to an rvalue, a temporary
const float4x4& temporary = GiveMeATemporary();
float4x4 resultsPlx = Transpose(temporary);
Same goes for pointer-based transfers, which is logical as the underlying mechanisms are the same. The relevant part of the C++11 specification is §12.2/5:
The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object to a subobject of which the temporary is bound persists for the lifetime of the reference except as specified below. A temporary bound to a reference member in a constructor’s ctor-initializer (§12.6.2 [class.base.init]) persists until the constructor exits. A temporary bound to a reference parameter in a function call (§5.2.2 [expr.call]) persists until the completion of the full expression containing the call.
This implies it should survive until the calling environment goes out of scope, which is far after the function returns. So, what gives? In all other cases, the variables get "optimized away", with the following exception:
Access violation reading location 0xFFFFFFFFFFFFFFFF
While the solution is obvious, prevent the user from passing temporaries directly with pointer-based transfers like some other libraries, I had hoped to actually make it a little bit more elegant without &s clogging the view.