ThreadSanitizer detects a data race, where is the problem?

Question

In this sample program, I'm trying to avoid using forward declaration and cyclic dependency exploiting a lambda function (called data_race)

struct B{
    int x;
    std::thread* tid;

    B(int _x){
        x = _x;
        tid = NULL;
    }

    ~B(){
        if(tid != NULL) delete tid;
    }

    void run(std::function<void(B*)> f){
        auto body = [&f, this](){
            f(this);
        };

        tid=new std::thread(body);
    }

    void wait(){
        tid->join();
    }
};

struct A{
    int x;
    std::mutex mtx;

    A(int _x){
        x = _x;
    }

    void foo(B* b){
        std::unique_lock<std::mutex> lock(mtx);
        x = b->x;
    };
};

int main(){
    A a(99);
    std::vector<B> v;

    auto data_race = [&a](B* b){ a.foo(b);};

    for(int i=0; i<10; i++){
        v.push_back(B(i));
    }

    for(int i=0; i<v.size(); i++){
        v[i].run(data_race);
    }

    for(int i=0; i<v.size(); i++){
        v[i].wait();
    }

    return 0;
}

However ThreadSanitizer detects a data race coming from the lambda function data_race. Can you help me understand why? The mutex inside A should be enough to avoid it. Also, can you help me find a solution?

EDIT: Using the forward declaration, the data race is not detected anymore, why? (Posting only the struct, without the main, sorry for the long post)

struct B;
struct A{
    int x;
    std::mutex mtx;

    A(int _x){
        x = _x;
    }

    void foo(B* b);
};

struct B{
    int x;
    std::thread* tid;

    B(int _x){
        x = _x;
        tid = NULL;
    }

    ~B(){
        if(tid != NULL) delete tid;
    }

    void run(A& a){
        auto body = [&a, this](){
            a.foo(this);
        };

        tid=new std::thread(body);
    }

    void wait(){
        tid->join();
    }
};

void A::foo(B* b){
    std::lock_guard<std::mutex> lock(mtx);
    x = b->x;
}

Assignment `x = b->x;` will cause the resulting value of `x` to come from random `B` object, even though access is properly synchronized. — user7860670, Aug 31 '19 at 19:56
Also unrelated note: Your class `B` violates the rule of 0/3/5 and will cause undefined behavior, should any `B`, after calling `run` on it, be copied/moved, or destroyed without prior call to `wait`, as could happen when modifying `v` after the `run` calls. — walnut, Aug 31 '19 at 22:14

walnut · Accepted Answer · 2019-08-31T21:52:58.177

You are passing a reference to the function-local f to the lambda body, which is invoked by the thread constructor.

This object may not be live anymore when the thread function reaches the call inside body.

To expand a bit:

A new thread created by run will execute a copy of body, which is a lambda containing a reference ([&f]) to the object f, which has the scope of the function run and will be destroyed when the main thread leaves run.

The thread function will call operator() on its reference to f in the line f(this) in body. This call is already causing undefined behavior if it should happen that run in the main thread reaches the scope end before the thread function executed this call. The data race here is, that the main thread may write access memory of f to destroy it, non-synchronized with the read access to memory of f in the spawned thread.

The intermediate function object can be avoided completely:

template<typename F>
void run(const F& f){
    auto body = [this](){
        f(this);
    };

    tid=new std::thread(body);
}

This will take the outer lambda data_race as a reference, copy this reference into body and, as long as you make sure data_race in main is out-lives all threads, the data race mentioned earlier is avoided.

Your edited code does something similar, i.e. it eliminates the object local to run. a in body will be a reference to the A defined in main which, again as long as main guarantees its lifetime extends beyond that of the threads, will not cause any issues at this point.

I'm not sure if this is the cause of the data race, but can you help me find a solution? — joyfantastic, Aug 31 '19 at 20:44
your solution worked, but I'm not really sure why...care to explain what's the problem with std::function? — joyfantastic, Aug 31 '19 at 21:30

score 0 · Answer 2 · answered Aug 31 '19 at 20:57

The data race is due to having one A object (a, declared in main) that is shared by all the different B objects, since it is passed by reference to the data_race lambda. This one A object is referenced in all those threads created in B::run, so the assignment x = b->x depends on which thread executes last, hence the data race.

ThreadSanitizer detects a data race, where is the problem?

2 Answers2