I've compiled your code with Clang++ to LLVM (without your benchmarking) with -Ofast
. Here's what you get for VariantPolyMorphism
, unsurprisingly:
define void @_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
ret void
}
On the other hand, PointerPolyMorphism
does really execute the loop and all calls:
define void @_Z19PointerPolyMorphismv() local_unnamed_addr #2 personality i32 (...)* @__gxx_personality_v0 {
%1 = tail call dereferenceable(16) i8* @_Znwm(i64 16) #8, !noalias !8
tail call void @llvm.memset.p0i8.i64(i8* nonnull align 16 dereferenceable(16) %1, i8 0, i64 16, i1 false), !noalias !8
%2 = bitcast i8* %1 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV7Derived, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8, !tbaa !11, !noalias !8
%3 = getelementptr inbounds i8, i8* %1, i64 8
%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 8, !tbaa !13, !noalias !8
%5 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* @_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
store i8* %1, i8** bitcast ({ { %struct.Base* } }* @_ZL3ptr to i8**), align 8, !tbaa !4
%6 = icmp eq %struct.Base* %5, null
br i1 %6, label %7, label %8
7: ; preds = %8, %0
br label %11
8: ; preds = %0
%9 = bitcast %struct.Base* %5 to i8*
tail call void @_ZdlPv(i8* %9) #7
br label %7
10: ; preds = %11
ret void
11: ; preds = %7, %11
%12 = phi i32 [ %17, %11 ], [ 0, %7 ]
%13 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* @_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
%14 = bitcast %struct.Base* %13 to void (%struct.Base*)***
%15 = load void (%struct.Base*)**, void (%struct.Base*)*** %14, align 8, !tbaa !11
%16 = load void (%struct.Base*)*, void (%struct.Base*)** %15, align 8
tail call void %16(%struct.Base* %13)
%17 = add nuw nsw i32 %12, 1
%18 = icmp eq i32 %17, 1000000
br i1 %18, label %10, label %11
}
The reason for this is that both your variables are static. This allows the compiler to infer that no code outside the translation unit has access to your variant instance. Therefore your loop doesn't have any visible effect and can be safely removed. However, although your smart pointer is static
, the memory it points to could still change (as a side-effect of the call to Process, for example). The compiler can therefore not easily prove that is safe to remove the loop and doesn't.
If you remove the static from both VariantPolyMorphism
you get:
define void @_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
store i32 0, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* @var, i64 0, i32 0, i32 1), align 4, !tbaa !16
store i32 1000000, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* @var, i64 0, i32 0, i32 0, i32 0, i32 0, i32 0), align 4, !tbaa !18
ret void
}
Which isn't surprising once again. The variant can only contain VarDerived
so nothing needs to be computed at run-time: The final state of the variant can already be determined at compile-time. The difference, though, now is that some other translation unit might want to access the value of var
later on and the value must therefore be written.
Derived
andVarDerived
has the same interfaces and the same processes, but compliler couldn't optimizestd::unique_ptr
one. That's my question. Hope it makes sense! – Dickless Oct 16 '20 at 13:39