4

Over two years ago, Stephan T. Lavavej described a space-saving optimization he implemented in Microsoft's implementation of std::make_shared, and I know from speaking with him that Microsoft has nothing against other library implementations adopting this optimization. If you know for sure whether other libraries (e.g., for Gnu C++, Clang, Intel C++, plus Boost (for boost::make_shared)) have adopted this implementation, please contribute an answer. I don't have ready access to that many make_shared implementations, nor am I wild about digging into the bowels of the ones I have to see if they've implemented the WKWYL optimization, but I'm hoping that SO readers know the answers for some libraries off-hand. I know from looking at the code that as of Boost 1.52, the WKWYL optimization had not been implemented, but Boost is now up to version 1.55.

Note that this optimization is different from std::make_shared's ability to avoid a dedicated heap allocation for the reference count used by std::shared_ptr. For a discussion of the difference between WKWYL and that optimication, consult this question.

Community
  • 1
  • 1
KnowItAllWannabe
  • 12,972
  • 8
  • 50
  • 91
  • If I understand correctly, this sort of optimization puts certain restrictions on the allocator behavior (namely, that allocator must leave enough unused space in the object's vicinity) - something MS can allow for in their tightly packaged development environment, but not something GCC guys would likely go for. – oakad May 27 '14 at 04:24
  • @oakad My understanding is that this is a portable library optimization that makes no assumptions about compilers or other library components (other than that they behave per the Standard). A pointer that is normally stored at runtime to handle the case when the deleter is called on a `std::shared_ptr` can be eliminated, because the template has the type information needed to cast the `void*` back to what it really is (i.e., the type passed to `std::make_shared`). In other words, the optimization consists of eliminating runtime data that can be calculated during compilation. – KnowItAllWannabe May 27 '14 at 04:32
  • 4
    Ok, so I misunderstood the concept (no wonder, considering that linked question is misleading and the original description is only available in video form). A brief glance at the boost mailing list reveals the following resolution: "What's the point? It [ref count control block] still need to be padded to 32 for good performance and to use DWCAS". That is, removing that extra pointer was found to be not helpful. – oakad May 27 '14 at 05:43

1 Answers1

5

libc++ appears to implement the optimization. See the difference between __shared_ptr_pointer and __shared_ptr_emplace in http://llvm.org/viewvc/llvm-project/libcxx/trunk/include/memory?revision=210211&view=markup.

libstdc++ also appears to implement it. See the difference between _Sp_counted_ptr and _Sp_counted_ptr_inplace in https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/shared_ptr_base.h?revision=210015&view=markup.

Jeffrey Yasskin
  • 5,171
  • 2
  • 27
  • 39