Optimize calls to allocator when replacing the content of a shared pointer

Question

Consider this program:

#include <memory>

struct T {
    T() {}
};

void do_something(std::shared_ptr<T> ptr) {
    // Do something with ptr; might or might not leave
    // other copies of ptr in other variables of the
    // program
}

int main() {
    std::shared_ptr<T> ptr = std::make_shared();
    do_something(ptr);
    // ptr might or might not be the only owner
    ptr = std::make_shared();
    return 0;
}

When make_shared executes for the second time, ptr might or might have other sharing owners, depending on what happens at runtime in do_something. If there are no others, ptr destructs and deallocates its previously owned object, when more or less at the same time a new object of the same time is allocated and constructed. Is there any way to avoid the allocation and deallocation, and use the same region for constructing the new object? (the target here is to optimize the two calls to the allocator)

Of course I accept that the new T object will be constructed after the old one will be destructed, while in the code above the opposite happens. So I would like something like ptr.replace<U>(args) which does the following: it decrements ptr's reference count; if the count goes to zero, there are no other weak references and U is the most derived type of the content of ptr, it destructs the owned object and it constructs a new one with arguments args in the same memory region, avoiding calls to the memory allocator. Otherwise it behaves like ptr = std::make_shared<U>(args).

Is there anyway to perform this optimization with the current standard library?

By profiling your code, did you find out that this is really a big issue? If it's not, you should not care. — Tobias Brösamle, Apr 04 '19 at 10:43
To clarify: Do you want a (free) function that would do what you need _using `std::shared_ptr`_ or are you up for reimplementing `std::shared_ptr`? You obviously can't get `ptr.replace(args)` in the first case (you can't just add a method to it), but the "with the current standard library" doesn't mesh with the second one. — Max Langhof, Apr 04 '19 at 10:48
@MaxLanghof You're right, I was a little too quick: I'm curious to know if there is a solution that can be applied directly to the standard library, but there is no need that it is a member function of `std::shared_ptr`. Something list `replace_shared_ptr(ptr, args)` would be ok as well. If I reimplemented `std::shared_ptr` from scratch I would know how to do it (more or less as I have described), but I am asking if the standard library already has room for this optimization. — Giovanni Mascellani, Apr 04 '19 at 12:13

score 2 · Answer 1 · answered Apr 04 '19 at 10:43

There is no mechanism to count the number of weak_ptrs to a shared object. You can only query the strong count (via shared_ptr::use_count). Note that in multithreaded environments, this is allowed to be an approximate count (i.e. using a memory_order_relaxed load).

Are you sure this is a performance bottleneck?

score 1 · Answer 2 · answered Apr 04 '19 at 11:10

1

Consider allocate_shared. It creates a shared_ptr with an allocator. It is possible to cache the control block of a freed shared_ptr in the allocator, and immediately reuse it in the next allocate_shared call, saving a delete and new.

I doubt that it will make much difference. In a multithreaded application, this allocator can be nontrivial to get both fast and correct.

answered Apr 04 '19 at 11:10

Michael Veksler

8,217
1
20
33

This is not what I would like, since it would require anyway a couple of calls into a shared resource (this sort of "caching pre-allocator"). I would like to take advantage of knowing that you are going to free and allocate two regions of the same size at more or less the same time. Thanks, however! – Giovanni Mascellani Apr 04 '19 at 12:15
@GiovanniMascellani I know what you mean. This is the best there is. Anyway, a simple allocator with a single cache element, that falls back to another allocator, can be mostly optimized away by the compiler (at least for the trivial deallocate-allocate case). – Michael Veksler Apr 04 '19 at 13:04

Optimize calls to allocator when replacing the content of a shared pointer

2 Answers2