5

Please have a look at the following code:

#include <pthread.h>
#include <boost/atomic.hpp>

class ReferenceCounted {
  public:
    ReferenceCounted() : ref_count_(1) {}

    void reserve() {
      ref_count_.fetch_add(1, boost::memory_order_relaxed);
    }

    void release() {
      if (ref_count_.fetch_sub(1, boost::memory_order_release) == 1) {
        boost::atomic_thread_fence(boost::memory_order_acquire);
        delete this;
      }
    }

  private:
    boost::atomic<int> ref_count_;
};

void* Thread1(void* x) {
  static_cast<ReferenceCounted*>(x)->release();
  return NULL;
}

void* Thread2(void* x) {
  static_cast<ReferenceCounted*>(x)->release();
  return NULL;
}

int main() {
  ReferenceCounted* obj = new ReferenceCounted();
  obj->reserve(); // for Thread1
  obj->reserve(); // for Thread2
  obj->release(); // for the main()
  pthread_t t[2];
  pthread_create(&t[0], NULL, Thread1, obj);
  pthread_create(&t[1], NULL, Thread2, obj);
  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);
}

This is somewhat similar to the Reference counting example from Boost.Atomic.

The main differences are that the embedded ref_count_ is initialized to 1 in the constructor (once the constructor is completed we have a single reference to the ReferenceCounted object) and that the code doesn't use boost::intrusive_ptr.

Please don't blame me for using delete this in the code - this is the pattern that I have in a large code base at work and there's nothing I can do about it right now.

Now this code compiled with clang 3.5 from trunk (details below) and ThreadSanitizer (tsan v2) results in the following output from ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=9871)
  Write of size 1 at 0x7d040000f7f0 by thread T2:
    #0 operator delete(void*) <null>:0 (a.out+0x00000004738b)
    #1 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:15 (a.out+0x0000000a2c06)
    #2 Thread2(void*) /home/A.Romanek/tmp/tsan/main.cpp:29 (a.out+0x0000000a2833)

  Previous atomic write of size 4 at 0x7d040000f7f0 by thread T1:
    #0 __tsan_atomic32_fetch_sub <null>:0 (a.out+0x0000000896b6)
    #1 boost::atomics::detail::base_atomic<int, int, 4u, true>::fetch_sub(int, boost::memory_order) volatile /home/A.Romanek/tmp/boost/boost_1_55_0/boost/atomic/detail/gcc-atomic.hpp:499 (a.out+0x0000000a3329)
    #2 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:13 (a.out+0x0000000a2a71)
    #3 Thread1(void*) /home/A.Romanek/tmp/tsan/main.cpp:24 (a.out+0x0000000a27d3)

  Location is heap block of size 4 at 0x7d040000f7f0 allocated by main thread:
    #0 operator new(unsigned long) <null>:0 (a.out+0x000000046e1d)
    #1 main /home/A.Romanek/tmp/tsan/main.cpp:34 (a.out+0x0000000a286f)

  Thread T2 (tid=9874, running) created by main thread at:
    #0 pthread_create <null>:0 (a.out+0x00000004a2d1)
    #1 main /home/A.Romanek/tmp/tsan/main.cpp:40 (a.out+0x0000000a294e)

  Thread T1 (tid=9873, finished) created by main thread at:
    #0 pthread_create <null>:0 (a.out+0x00000004a2d1)
    #1 main /home/A.Romanek/tmp/tsan/main.cpp:39 (a.out+0x0000000a2912)

SUMMARY: ThreadSanitizer: data race ??:0 operator delete(void*)
==================
ThreadSanitizer: reported 1 warnings

The strange thing is that thread T1 does a write of size 1 to the same memory location as thread T2 when doing atomic decrement on the reference counter.

How can the former write be explained? Is it some clean-up performed by the destructor of ReferenceCounted class?

It is a false positive? Or is the code wrong?

My setup is:

$ uname -a
Linux aromanek-laptop 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ clang --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix

The code is compiled like this:

clang++ main.cpp -I/home/A.Romanek/tmp/boost/boost_1_55_0 -pthread -fsanitize=thread -O0 -g -ggdb3 -fPIE -pie -fPIC

Note that on my machine the implementation of boost::atomic<T> resolves to __atomic_load_n family of functions, which ThreadSanitizer claims to understand.

UPDATE 1: the same happens when using clang 3.4 final release.

UPDATE 2: the same problem occurs with -std=c++11 and <atomic> with both libstdc++ and libc++.

Adam Romanek
  • 1,809
  • 1
  • 19
  • 36
  • I don't know what Boost means, since the C++ standard doesn't say anything about Boost, but if these were `std` atomics, then an `std::atomic_thread_fence` only synchronizes-with other atomic operations if there's a suitable intermediate access. See [atomic.fences] for details. – Kerrek SB Jun 27 '14 at 08:48
  • 1
    Try changing the fence to `ref_count_.fetch_add(0, boost::memory_order_acquire)`. – Kerrek SB Jun 27 '14 at 08:50
  • @KerrekSB, indeed, changing the fence to the `fetch_add()` you proposed makes tsan quiet. However, I'm not sure if this is just a workaround for tsan or it actually fixes a race in the code. – Adam Romanek Jun 27 '14 at 12:34
  • 1
    See my comments below. The fence plays by certain rules. You can reason about the rules if you want, or you could just not use fences. Fences are a poor primitive, since it's very unusual that you wouldn't want to synchronize on some specific object. – Kerrek SB Jun 27 '14 at 14:55
  • It would be interesting to see if std::atomic reports the same issue. – James Jun 30 '14 at 11:35
  • @James, nothing changes with std::atomic, TSan [reports the same issue](http://pastebin.com/56TLxpSR). – Adam Romanek Jun 30 '14 at 11:54

2 Answers2

3

This looks like a false positive.

The thread_fence in the release() method enforces all outstanding writes from fetch_sub-calls to happen-before the fence returns. Therefore, the delete on the next line cannot race with the previous writes from decreasing the refcount.

Quoting from the book C++ Concurrency in Action:

A release operation synchronizes-with a fence with an order of std::memory_order_acquire [...] if that release operation stores a value that's read by an atomic operation prior to the fence on the same thread as the fence.

Since decreasing the refcount is a read-modify-write operation, this should apply here.

To elaborate, the order of operations that we need to ensure is as follows:

  1. Decreasing the refcount to a value > 1
  2. Decreasing the refcount to 1
  3. Deleting the object

2. and 3. are synchronized implicitly, as they happen on the same thread. 1. and 2. are synchronized since they are both atomic read-modify-write operations on the same value. If these two could race the whole refcounting would be broken in the first place. So what is left is synchronizing 1. and 3..

This is exactly what the fence does. The write from 1. is a release operation that is, as we just discussed, synchronized with 2., a read on the same value. 3., an acquire fence on the same thread as 2., now synchronizes with the write from 1. as guaranteed by the spec. This happens without requiring an addition acquire write on the object (as was suggested by @KerrekSB in the comments), which would also work, but might be potentially less efficient due to the additional write.

Bottom line: Don't play around with memory orderings. Even experts get them wrong and their impact on performance is often negligible. So unless you have proven in a profiling run that they kill your performance and you absolutely positively have to optimize this, just pretend they don't exist and stick with the default memory_order_seq_cst.

ComicSansMS
  • 51,484
  • 14
  • 155
  • 166
  • I'm suspicious of that answer. TSan is pretty good and actually runs through the C++ memory model pretty accurately. See my above comment; I think the OP's code does indeed not synchronize. – Kerrek SB Jun 27 '14 at 08:49
  • @KerrekSB See my edit. boost:atomic is [supposed to be 100% compliant with the standard atomics](http://www.boost.org/doc/libs/1_55_0/doc/html/atomic.html). As usual with atomics, all of this is quite subtle, so I won't rule out that I interpret things wrong somewhere. – ComicSansMS Jun 27 '14 at 09:00
  • OK, thanks. See the cited standard reference - 29.8/4 sounds to me like you need, before the fence, an operation that reads the value that was being released, otherwise the fence does not synchronize with the release. – Kerrek SB Jun 27 '14 at 09:12
  • @KerrekSB I guess in that section in the Standard, `X` would be the write by the thread that performs the delete and `A` would be the write by the other thread (the one that just decreases the refcount, but does not delete). So `A` should synchronize with the fence (which is running on the same thread as `X` in this case), which is what is required to eliminate the race. – ComicSansMS Jun 27 '14 at 09:16
  • Hm... I think the write is `A`, the `ref_count_` variable is `M`, the acquire fence is `B`, and there is no `X`! – Kerrek SB Jun 27 '14 at 09:21
  • @KerrekSB Tried to elaborate a bit more. My head is spinning now, so I hope it's convincing :D – ComicSansMS Jun 27 '14 at 09:29
  • 4
    This is in fact a false positive - [it has been confirmed](https://groups.google.com/d/msg/thread-sanitizer/dZ3QiQE49ns/j7rKGHg08foJ) on the thread-sanitizer discussion group. – Adam Romanek Jul 04 '14 at 06:22
2

Just to highlight @adam-romanek's comment for others who stumble upon this, at the time of writing (March 2018) ThreadSanitizer does not support standalone memory fences. This is alluded to in the ThreadSanitizer FAQ which doesn't explicitly mention fences are supported:

Q: What synchronization primitives are supported? TSan supports pthread synchronization primitives, built-in compiler atomic operations (sync/atomic), C++ operations are supported with llvm libc++ (not very throughly [sic] tested, though).

Anon
  • 6,306
  • 2
  • 38
  • 56