A few days ago I happened to watch this very interesting presentation by Stephan T. Lavavej, which mentions the "We Know Where You Live" optimization (sorry for using the acronym in the question title, SO warned me the question might have been closed otherwise), and this beautiful one by Herb Sutter on machine architecture.
Briefly, the "We Know Where You Live" optimization consists in placing the reference counters on the same memory block as the object which make_shared
is creating, thus resulting in one single memory allocation rather than two and making shared_ptr
more compact.
After summing up what I learnt from the two presentations above, however, I started to wonder whether the WKWYL optimization could not degrade performance in case shared_ptr
is accessed by multiple threads running on different cores.
If the reference counters are close to the actual object in memory, in fact, they should be more likely to be fetched into the same cache line as the object itself. This in turn, if I got the lesson correctly, would make it more likely that threads will slow down while competing for the same cache line even when they do not need to.
Suppose one of the threads needs to update the reference counter several times (e.g. when copying the shared_ptr
around), while the other ones just need to access the pointed object: isn't this going to slow down the execution of all threads by making them compete for the same cache line?
If the refcount lived somewhere else in memory, I would say contention would be less likely to arise.
Does this make a good argument against using make_shared()
in similar cases (as long as it implements the WKWYL optimization, of course)? Or is there a fallacy in my reasoning?