Honestly, I would advise against putting __forceinline
especially on a singleton GetInstance
function since it's not bottleneck.
If you want performance, you must mesure first or assume any change is gonna make the code slower.
Compiler flags will help performance much more than trying to __forceinline
stuff. The compiler knows much more than you think. Adding LTO, PGO, compiling for a specific CPU architecture can help a long way.
What if for example the GetInstance
function is forced to be inline and result in more binary size and screw with the instruction cache and slows down your program? I would say it's very unlikely, but if you haven't measured, you cannot assume it won't.
Also in a OOP designed program where you usually see singleton like so, a simple GetInstance
function is very unlikely to cause any significant performance problem. Allocations, memory scattering, inheritance based polymorphism, virtual functions are usually abused in strongly OOP based designs. Those are far more likely to be performance bottlenecks.