20

I am writing intrusive shared pointer and I am using C++11 <atomic> facilities for reference counter. Here are the relevant fragments of my code:

//...
mutable std::atomic<unsigned> count;
//...

void
SharedObject::addReference() const
{
    std::atomic_fetch_add_explicit (&count, 1u,
        std::memory_order_consume);
}

void
SharedObject::removeReference() const
{
    bool destroy;

    destroy = std::atomic_fetch_sub_explicit (&count, 1u,
        std::memory_order_consume) == 1;

    if (destroy)
        delete this;
}

I have started with memory_order_acquire and memory_order_release first but then I convinced myself that memory_order_consume should be good enough. After further deliberation it seems to me that even memory_order_relaxed should work.

Now, the question is whether I can use memory_order_consume for the operations or could I use weaker ordering (memory_order_relaxed) or should I use stricter ordering?

wilx
  • 17,697
  • 6
  • 59
  • 114
  • Since the counter essentially acts as a recursive lock for the `delete` statement, I'd say that "acquire" in the `addReference` and "release" in the `removeReference` are the right orderings. But your `addReference` should also make sure that the counter wasn't zero! – Kerrek SB Apr 22 '12 at 14:41
  • @KerrekSB: The counter can be zero in `addReference()` after the object is first created before it is assigned to a `SharedPtr<>`. Acquire/release semantics seems that it should always work. But is it not possible to use weaker ordering constraint and why not? – wilx Apr 22 '12 at 14:45
  • About the zero: Suppose the refcount is 1. Now thread 1 wants to delete the object and calls subtract. If at this point thread 2 wants to *increase* the thread count, it increments zero to one, but thread 1 will go ahead and delete the object anyway. That should be avoided. – Kerrek SB Apr 22 '12 at 17:13
  • 1
    @KerrekSB: That sounds like it would be a situation with unbalanced `addReference()`/`removeReference()` or with unprotected access to instance of `SharedPtr` which I would consider undefined behaviour. `SharedPtr`s as members themselves need to be protected by mutexes etc. the same as other members of its enclosing class. – wilx Apr 22 '12 at 19:45
  • @KerrekSB "_I'd say that "acquire" in the addReference_" acquire of what release? – curiousguy Jul 29 '12 at 03:17
  • This isn't an answer, but I will note that memory-order_consume has almost NO applications this side of 1024 core shared memory beasts... or DEC Alphas. It offers guarantees which are customary in C++, but a few exotic platforms can violate. Stick to acquire and release, which have sane behavior, and are no more expensive on virtually any platform you will ever use. – Cort Ammon Sep 03 '13 at 06:39

1 Answers1

22
void
SharedObject::addReference() const
{
    std::atomic_fetch_add_explicit (&count, 1u, std::memory_order_relaxed);
}

void
SharedObject::removeReference() const
{
    if ( std::atomic_fetch_sub_explicit (&count, 1u, std::memory_order_release) == 1 ) {
         std::atomic_thread_fence(boost::memory_order_acquire);
         delete this;
    }
}

You want to use atomic_thread_fence such that the delete is strictly after the fetch_sub. Reference

Quote from the linked text:

Increasing the reference counter can always be done with memory_order_relaxed: New references to an object can only be formed from an existing reference, and passing an existing reference from one thread to another must already provide any required synchronization.

It is important to enforce any possible access to the object in one thread (through an existing reference) to happen before deleting the object in a different thread. This is achieved by a "release" operation after dropping a reference (any access to the object through this reference must obviously happened before), and an "acquire" operation before deleting the object.

It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.

user2k5
  • 827
  • 1
  • 7
  • 9
  • While I have already accepted the answer and implemented my intrusive pointer that way, the quote made me think a bit and here is another question. What about relaxing the decrement to `memory_order_relaxed` and making the true branch of the `if()` use `memory_order_acq_rel`? – wilx Apr 23 '12 at 07:00
  • with your `_relaxed`, then `_acq_rel`, there is no strictly ordering of `fetch_sub(_relaxed)` and `fence(_acq_rel)` then `delete`. You may be interested in this [page](http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/thread_coordination.html). – user2k5 Apr 23 '12 at 07:26
  • Ok, after reading the page I think I understand that it cannot be `fetch_sub(_relaxed)`. It is because any `_relaxed` operation is not ordered against other operations at all. But what about `fetch_sub(_consume)` and `fence(_acq_rel)`? There it seems to me that `fetch_sub(_consume)` poses a compiler reordering barrier and `fence(_acq_rel)` orders all loads and stores against the rest of the machine and the following `delete` will have consistent view of the memory. Or am I wrong? – wilx Apr 23 '12 at 07:55
  • 2 points here: 1) `_consume` is for pointers such that no operations *related to the pointer* after the `_consume` statement can be moved before it. You don't want to use `_consume` here. 2) if you are talking about the `fetch_sub(_acquire)` (instead of `_consume`) here, `_acquire` variable does not pose barrier for the `delete` strictly after `fetch_sub`, but the other way round. – user2k5 Apr 23 '12 at 08:11
  • Ok, I think I mostly get it now. I have missed this bit from [cppreference](http://en.cppreference.com/w/cpp/atomic/memory_order) "The reads from the affected memory location that carry data dependency cannot be reordered before the load; *other reads can be*. On most platforms, this affects compiler optimization only." This was very enlightening discussion for me. Thank you. – wilx Apr 23 '12 at 08:28
  • @user2k5 The principle of consume works as well for an index into an array as a pointer. It can be used for anything that can be value dependent. – curiousguy May 06 '19 at 12:06
  • @wilx "_`fence(_acq_rel)` orders all loads and stores against the rest of the machine_" No. A fence will only create ordering WRT to past and future fences or atomic operations that do acq or rel. – curiousguy May 06 '19 at 12:09