33

In the following code, while ( !Ref.expired() ); is joyfully optimized into an infinite loop. If the line of code is changed to while ( !Ref.lock() );. everything works as expected. So two questions really:

1) How can the compiler optimize away expired when std::weak_ptr::expired() accesses a memory-fenced counter?

2) Is Ref.lock() actually safe, or could this too be optimized away?

Sample code below.

#include <iostream>
#include <memory>
#include <thread>
#include <chrono>

class A
{
public:

    A() 
    { 
        m_SomePtr = std::make_shared<bool>( false );
    }

    virtual ~A()
    {
        std::weak_ptr<bool> Ref = m_SomePtr;
        m_SomePtr.reset();

        // Spin (will be optimised into an infinite loop in release builds)
        while ( !Ref.expired() );
    }

    std::shared_ptr<bool> GetPtr() const { return m_SomePtr; }

private:
    std::shared_ptr<bool> m_SomePtr;
};

class B
{
public:
    B( std::shared_ptr<bool> SomePtr ) : m_Ref( SomePtr ) {}

    void LockPtr() { m_SomePtr = m_Ref.lock(); }
    void UnLockPtr() { m_SomePtr.reset(); }

private:
    std::shared_ptr<bool> m_SomePtr;
    std::weak_ptr<bool> m_Ref;
};

int main()
{
    std::unique_ptr<A> a( new A() );
    std::unique_ptr<B> b( new B( a->GetPtr() ) );

    b->LockPtr();

    std::cout << "Starting " << std::endl;

    std::thread first( [&]()
    {
        std::this_thread::sleep_for( std::chrono::seconds( 5 ) );
        b->UnLockPtr();
    } );

    std::thread second( [&]()
    {
        a.reset( nullptr );
    } );

    first.join();
    second.join();

    std::cout << "Complete" << std::endl;
    return 0;
}
curiousguy
  • 8,038
  • 2
  • 40
  • 58
Puff Of Hot Air
  • 331
  • 2
  • 4
  • 2
    upvoted for **joyfully** *optimized away*. are you suggesting machines and/or programs can enjoy themselves? – Walter Jan 23 '15 at 14:16
  • 14
    @Walter I had the distinct impression that the compiler was extremely pleased with itself. I, however, was extremely _displeased_. The compiler and I have a tumultuous relationship. – Puff Of Hot Air Jan 23 '15 at 14:24
  • [Seems to work with gcc](http://coliru.stacked-crooked.com/a/36ffe423f76ecf02). [Also works with clang](http://coliru.stacked-crooked.com/a/5b3827c261b54b90). Probably a visual studio bug. – interjay Jan 23 '15 at 14:26
  • signs of our techonological advancement finally approaching the Singularity :P – Karoly Horvath Jan 23 '15 at 14:26
  • 7
    @PuffOfHotAir: *That* compiler has a tumultuous relationship with everyone. – Andy Prowl Jan 23 '15 at 14:28
  • Ouch! That said, this looks like a very dangerous pattern to begin with: having a non-real-time destructor is just begging for trouble, isn’t it? – Konrad Rudolph Jan 23 '15 at 14:30
  • @interjay The compiler in this case is Visual C++ 12.0 (2013), and I also tried this in Visual C++ 11.0 (2012). I don't understand _why_ this optimization could be valid. – Puff Of Hot Air Jan 23 '15 at 14:31
  • @PuffOfHotAir Yes, I just noticed the tags. I guess it's a VC++ bug. – interjay Jan 23 '15 at 14:32
  • 1
    What gave you the idea that the counter is fenced? I can't find anything in the standard that says that it is, and gcc's libstdc++ agrees with msvc here insofar that it uses relaxed memory order for the counter (even if gcc doesn't do the loop optimization by default anymore; try it with `-faggressive-loop-optimizations`). Also, the way I read it, calling `lock()` on that `weak_ptr` introduces a data race. – Wintermute Jan 23 '15 at 14:32
  • The exact version of MSVC may be useful, as 2013 went through a lot of changes from U0 to U4. – Yakk - Adam Nevraumont Jan 23 '15 at 14:32
  • It seems to work with http://webcompiler.cloudapp.net/. – wilx Jan 23 '15 at 14:34
  • @Wintermute `weak_ptr::lock()` is thread safe, so it is either null or not. In the code above this would mean that it spins until the other thread releases the `shared_ptr` reference. – Puff Of Hot Air Jan 23 '15 at 14:37
  • @Yakk I appear to be using update 3. – Puff Of Hot Air Jan 23 '15 at 14:39
  • Where does it say that `weak_ptr::lock()` is thread safe? Not in [util.smartptr.weak.obs], where it is defined, and not in [util.smartptr.shared.const], where the `shared_ptr` constructor it uses it is defined. If it is there, I can't find it. – Wintermute Jan 23 '15 at 14:39
  • @Wintermute `shared_ptr` reference counting is thread safe and [link](http://www.cplusplus.com/reference/memory/weak_ptr/lock/) mentions that the operation is atomic. – Puff Of Hot Air Jan 23 '15 at 14:47
  • @Wintermute: this is specified [util.smartptr.weak.obs]/5. – Andy Prowl Jan 23 '15 at 14:48
  • @AndyProwl I don't see anything about thread safety there. It just says that `lock()` returns `expired() ? shared_ptr() : shared_ptr(*this)`. @PuffOfHotAir I don't know that cplusplus.com is reliable in such matters. I'm looking at the standard, and I don't see it. I may be looking in the wrong places, but it is not obvious to me that this is meant to be threadsafe. – Wintermute Jan 23 '15 at 14:53
  • @Wintermute: Right after what you quoted: `", executed atomically."` – Andy Prowl Jan 23 '15 at 15:01
  • @AndyProwl That is not in [util.smartptr.weak.obs]/5, at least not in n3290 and n3690. [util.smartptr.shared]/4 seems relevant, though, if not terribly clear. I think it indicates that data races are not introduced, but not that synchronization between threads has to be expected (i.e., no fences are required). – Wintermute Jan 23 '15 at 15:06
  • 1
    @Wintermute I do not have the standard but [link](http://wg21.cmeerw.net/lwg/issue2316) appears to discuss ambiguity in regards to `lock()`, perhaps leading to the `executed atomically`. The point is that the code is _correct_ from a thread-safe point of view (`expired()` must access a thread-safe counter), but the optimizer removes the calls leaving non-functional code. – Puff Of Hot Air Jan 23 '15 at 15:15
  • @Wintermute: I'm looking at [N4296](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf). – Andy Prowl Jan 23 '15 at 15:16
  • @AndyProwl Alas, VC2013 does not claim to support C++17. @PuffOfHotAir thinking back to libstdc++'s use of relaxed memory order, this looks to me as though the counter has to be atomic, but no synchronization is ever required (no acquire/release fences have to be involved). So how is `lock()` allowed to behave with relaxed memory order on the reference counter? I'll need a moment to think about that. – Wintermute Jan 23 '15 at 15:22
  • @Wintermute I don’t think there was an intention to change the semantics in C++17, it’s more like a documentation bug fix. The requirement for `lock`’s thread safety is implied from the thread safety of `std::shared_ptr`. In particular, [util.smartptr.shared]/4 declares that “Changes in `use_count()` do not reflect modifications that can introduce data races.” – That *must* extend to `std::weak_ptr::use_count` for consistency, and `weak_ptr` was in fact mentioned in the directly preceding sentence, alongside `shared_ptr`. – Konrad Rudolph Jan 23 '15 at 15:28
  • @KonradRudolph That may be so, but alone it is not enough. See [intro.multithread]/5, where it says '"Relaxed" atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races.' However, [intro.multithread]/6 may save the day, since it states that atomic modifications of the same object (that `lock()` does, as opposed to `expired()`) at least have to occur "in some particular total order". MSVC is technically correct that it can optimize the `expired()` loop away, though, even if that is a vexing thing for it to do. – Wintermute Jan 23 '15 at 15:59

1 Answers1

9

Your program is incorrect; the shared-ownership pointer facilities are not intended to be used for synchronization.

[intro.multithread]/24:

The implementation may assume that any thread will eventually do one of the following:
— terminate,
— make a call to a library I/O function,
— access or modify a volatile object, or
— perform a synchronization operation or an atomic operation.

std::weak_ptr::expired() is not a synchronization operation or an atomic operation; all the Standard says is that it does not introduce a data race. Since the resolution to Library defect 2316, std::weak_ptr::lock() is considered an atomic operation, so to answer 2) your code using Ref.lock() is valid as of C++14.

Now, it's true that if you were to attempt to create your own library implementation of weak_ptr using the language and library facilities, it would necessarily use the synchronization and/or atomic operation facilities, so a user-provided weak_ptr::expired() would be OK to spin on (the implementation would be obliged to ensure that the thread eventually made progress, per [intro.multithread]/2 and /25). But an implementation is not obliged to restrict its own library to the language and library facilities.

I'm not entirely sure how the compiler is optimizing away the access to expired(). I'd guess that the MSVC library is exploiting aspects of the x86 memory model that the compiler/optimizer observes are not guaranteed by the C++ memory model.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    `Nor is std::weak_ptr::lock() a synchronization operation or an atomic operation`: isn't this in conflict with [util.smartptr.weak.obs]/5, which (at least in N4296) explicitly says `, executed atomically`? – Andy Prowl Jan 23 '15 at 19:38
  • @AndyProwl ah, so issue 2316 got into C++14? (Yes, according to Library R87.) I agree that that changes things; I'll amend my answer. – ecatmur Jan 23 '15 at 21:04
  • 2
    @ecatmur Did any compiler actually make `lock()` nonatomic *before* that issue got added to the standard? That would be incredibly nonintuitive... – Barry Jan 23 '15 at 21:09
  • @Barry it seems highly unlikely; I'm sure that any decent library vendor would do the right thing. As STL says at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3893.html#2316, "*We all know what weak_ptr::lock() should do*". – ecatmur Jan 23 '15 at 21:13
  • 4
    Digging through the headers reveals that MSVC's stdlib uses a non-atomic read of the reference count to implement `weak_ptr::expired` and `shared_ptr::use_count` on x86. We can argue all day about whether or not it is strictly conforming, but I think it's fairly clear that it is poor QoI. – Casey Jan 23 '15 at 21:15
  • @Casey ah great, I thought it might be something like that. Note that a non-atomic read can be perfectly fine on x86 because of the Intel memory model; reads are not reordered relative to other reads. – ecatmur Jan 23 '15 at 21:18
  • I don't agree that this code is "incorrect". Semantics aside, `expired()` is intended as a thread safe operation; which means that `use_count` must be safe to update and read from multiple threads (however that is achieved). Surely then this should imply that spinning on `expired()` is perfectly valid. I would think that the MSVC stdlib should have marked the reference counter as volatile to prevent this kind of optimisation. – Puff Of Hot Air Jan 24 '15 at 07:08
  • 1
    @Puff Of Hot Air it's "safe" to read from multiple threads in that it doesn't introduce a data race; but unless the Standard says so it isn't a synchronization point which means that it isn't guaranteed to make progress. (And a thread that isn't guaranteed to make progress is invalid.) The Standard could have chosen to make `use_count` and `expired` synchronization points, and its failure to do so could be regarded as a deficiency, but to do so would impose on implementations and arguably promote unclear code i.e. using shared-ownership pointer facilities for synchronization. – ecatmur Jan 24 '15 at 07:29
  • "_The implementation may assume that any thread will eventually do one of the following_" how is that in any way relevant? – curiousguy Aug 02 '15 at 04:17
  • @curiousguy a thread that ends in a loop that does not do [one of the following] has undefined behavior, so the compiler is free to modify it to anything it likes. – ecatmur Aug 03 '15 at 08:20
  • 1
    @ecatmur But why is `while ( !Ref.expired() );` an infinite loop? Your answer is taking the conclusion as a given! – curiousguy Aug 03 '15 at 18:27
  • 2
    @curiousguy hmm, maybe I could be clearer on that. The next paragraph states: *25 - An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.* `expired` is not a value assigned by an atomic or synchronization operation, and there is no other atomic read or synchronization operation within that loop, so it is consistent with the Standard that it should not be observed to change. – ecatmur Aug 03 '15 at 19:10