3

The clang ThreadSanitizer reports a data race in the following code:

#include <future>
#include <iostream>
#include <vector>

int main() {
    std::cout << "start!" << std::endl;
    for (size_t i = 0; i < 100000; i++) {
        std::promise<void> p;
        std::future<void> f = p.get_future();

        std::thread t = std::thread([p = std::move(p)]() mutable {
            p.set_value();
        });

        f.get();
        t.join();
    }
    std::cout << "done!" << std::endl;
    return 0;
}

I can fix the race by replacing p = std::move(p) with &p. However, I couldn't find documentation that explained whether the promise and future objects are thread safe or whether it matters in which order they are destroyed. My understanding was that since the promise and future communicate via a "shared state", the state should be thread-safe and destruction order shouldn't matter, but TSan disagrees. (Without TSan, the program seems to behave correctly, not crash.)

Does this code actually have a potential race, or is this a TSan false positive?


You can reproduce this with Clang 9 by running the following commands in an Ubuntu 19.10 Docker container:

$ docker run -it ubuntu:eoan /bin/bash

Inside container:

# apt update
# apt install clang-9 libc++-9-dev libc++abi-9-dev

# clang++-9 -fsanitize=thread -lpthread -std=c++17 -stdlib=libc++ -O0 -g test.cpp -o test
(See test.cpp file contents above)

# ./test

Example output showing a data race (actual output varies a bit between runs):

==================
WARNING: ThreadSanitizer: data race (pid=9731)
  Write of size 8 at 0x7b2000000018 by thread T14:
    #0 operator delete(void*) <null> (test+0x4b4e9e)
    #1 std::__1::__shared_count::__release_shared() <null> (libc++.so.1+0x83f2c)
    #2 std::__1::__tuple_leaf<1ul, test()::$_0, false>::~__tuple_leaf() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:170:7 (test+0x4b7d38)
    #3 std::__1::__tuple_impl<std::__1::__tuple_indices<0ul, 1ul>, std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>::~__tuple_impl() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:361:37 (test+0x4b7ce9)
    #4 std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>::~tuple() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:466:28 (test+0x4b7c98)
    #5 std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> >::operator()(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>*) const /usr/lib/llvm-9/bin/../include/c++/v1/memory:2338:5 (test+0x4b7c16)
    #6 std::__1::unique_ptr<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>, std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> > >::reset(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>*) /usr/lib/llvm-9/bin/../include/c++/v1/memory:2593:7 (test+0x4b7b80)
    #7 std::__1::unique_ptr<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>, std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> > >::~unique_ptr() /usr/lib/llvm-9/bin/../include/c++/v1/memory:2547:19 (test+0x4b74ec)
    #8 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> >(void*) /usr/lib/llvm-9/bin/../include/c++/v1/thread:289:1 (test+0x4b7397)

  Previous atomic read of size 1 at 0x7b2000000018 by main thread:
    #0 pthread_cond_wait <null> (test+0x4268d8)
    #1 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) <null> (libc++.so.1+0x422de)
    #2 main /test/test.cpp:61:9 (test+0x4b713c)

  Thread T14 (tid=18144, running) created by main thread at:
    #0 pthread_create <null> (test+0x425c6b)
    #1 std::__1::__libcpp_thread_create(unsigned long*, void* (*)(void*), void*) /usr/lib/llvm-9/bin/../include/c++/v1/__threading_support:336:10 (test+0x4b958c)
    #2 std::__1::thread::thread<test()::$_0, void>(test()::$_0&&) /usr/lib/llvm-9/bin/../include/c++/v1/thread:303:16 (test+0x4b6fc4)
    #3 test() /test/test.cpp:44:25 (test+0x4b6d96)
    #4 main /test/test.cpp:61:9 (test+0x4b713c)

SUMMARY: ThreadSanitizer: data race (/test/test+0x4b4e9e) in operator delete(void*)
==================
jtbandes
  • 115,675
  • 35
  • 233
  • 266
  • What are those line numbers? And does it report a data race if you don't make a loop of thread creation? – Nicol Bolas Nov 12 '19 at 03:04
  • I had some logging code I removed from the test program when I posted the question. As for the loop, it doesn't report a race without it, but that might just be because it's an unlikely condition. I added the loop to help trigger the race. – jtbandes Nov 12 '19 at 03:07
  • Actually yes, it seems I was able to reproduce the race even if the program doesn't have a loop (just by running it many times) – jtbandes Nov 12 '19 at 03:09
  • You still didn't say what lines those errors are found on. That is, specifically what code is being pointed to. – Nicol Bolas Nov 12 '19 at 03:30
  • 1
    Ah, I misunderstood your original question. The line number points to the `std::thread(...)` construction. – jtbandes Nov 12 '19 at 04:01
  • I've also filed a ticket on the LLVM bug tracker: https://bugs.llvm.org/show_bug.cgi?id=43984 – jtbandes Nov 12 '19 at 22:17

1 Answers1

1

When a promise goes out of scope*, the following happens:

  • if the shared state is not ready,
    • stores an exception of type future_error with error type broken_promise within the shared state, then
    • makes the state ready
  • otherwise, the state is ready

calling get() on the future then can only cause an exception if no value was ever set on the promise before it went out of scope.

Now, it's actually pretty hard to make a promise go out of scope before the shared state has a value. Either the thread has exited via exception anyway, or you have a logic error where not all branches call promise::set_value.

Your specific code does not appear to exhibit any symptoms like this. Moving a promise simply moves ownership of the shared state to the new promise.

As for race conditions, get_future is guaranteed to not have any data races with promise::set_value and its variations. future::get is also guaranteed to wait until the shared_state is ready. When a promise goes out of scope, it "releases" its shared state after making it ready, which would destroy the shared state only if it held the last reference to it. Since you have another reference to it (via a future), you're safe.

Now, it's always possible that the implementation has data races (by accident), but per the standard the code you posted shouldn't have any.


*Refer to [futures.state]

AndyG
  • 39,700
  • 8
  • 109
  • 143
  • Thanks for your answer (2 in one day)! However, I'm not sure I follow your logic exactly; perhaps my question was not specific enough. You say *"…if the shared state was not ready when the promise went out of scope…"* I see that this would lead to an exception, and non-deterministic behavior if the program might *sometimes* set a value and sometimes not. But my sample program always makes the state ready (via `set_value()`), and anyway I don't see how this is a **data race**. My question is specifically about whether the API is safe to use the way this sample program uses it. – jtbandes Feb 07 '20 at 19:15
  • To summarize, I don't see why *"calling `get()` on the future after the promise has gone out of scope is a race condition"* in my sample program. The shared state is always made ready before the promise goes out of scope. – jtbandes Feb 07 '20 at 19:16
  • @jtbandes: My apologies for not being clear. I guess race condition is not the right term here. Since the promise owns the shared state when it goes out of scope there will either be an exception or not. It's pretty deterministic. I've cleared my post up. Hope that helps. – AndyG Feb 07 '20 at 19:37
  • Thank you for clarifying! So, is your opinion that the data race reported by TSan is either a false positive or a bug in libc++, rather than a bug in my code? – jtbandes Feb 07 '20 at 21:02
  • 1
    @jtbandes Indubitably (needed a synonym for "Yes" to get past 15 char limit) – AndyG Feb 07 '20 at 21:28