12

Compiling with gcc 4.7.2 on Ubuntu, compiled with -std=c++11 -O0 -pthread, I somehow created a deadlock in code that doesn't seem like it should ever run into that problem. I have a thread which just acquires a lock and then runs through a vector<function<void()>>, calling everything. Meanwhile, the main thread pushes std::packaged_task<int()>s onto it one-by-one and blocks on when that task's future returns. The tasks themselves are trivial (print and return).

Here is the full code. Running the app sometimes succeeds, but within a few tries will hang:

#include <iostream>
#include <future>
#include <thread>
#include <vector>
#include <functional>

std::unique_lock<std::mutex> lock() {
    static std::mutex mtx;
    return std::unique_lock<std::mutex>{mtx};
}

int main(int argc, char** argv)
{
    std::vector<std::function<void()>> messages;
    std::atomic<bool> running{true};

    std::thread thread = std::thread([&]{
        while (running) {
            auto lk = lock();
            std::cout << "[T] locked with " << messages.size() << " messages." << std::endl;
            for (auto& fn: messages) {
                fn();
            }   
            messages.clear();
        }   
    }); 

    for (int i = 0; i < 1000000; ++i) {
        std::packaged_task<int()> task([=]{
            std::cout << "[T] returning " << i << std::endl;
            return i;
        }); 

        {   
            auto lk = lock();
            messages.emplace_back(std::ref(task));
        }   

        task.get_future().get();
    }   

    running = false;
    thread.join();
}

Sample output:

[T] returning 127189
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
[T] returning 127190
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
[T] returning 127191
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
... hangs forever ...

What's going on? Why does the call into packaged_task::operator() hang? Where is the deadlock? Is this a gcc bug?

[update] Upon deadlock, the two threads are at:

Thread 1 (line 39 is the task.get_future().get() line):

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007feb01fe800c in __gthread_cond_wait (this=Unhandled dwarf expression opcode 0xf3
)
    at [snip]/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:879
#2  std::condition_variable::wait (this=Unhandled dwarf expression opcode 0xf3
) at [snip]/gcc-4.7.2/libstdc++-v3/src/c++11/condition_variable.cc:52
#3  0x0000000000404aff in void std::condition_variable::wait<std::__future_base::_State_base::wait()::{lambda()#1}>(std::unique_lock<std::mutex>&, std::__future_base::_State_base::wait()::{lambda()#1}) (this=0x6111e0, __lock=..., __p=...)
    at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/condition_variable:93
#4  0x0000000000404442 in std::__future_base::_State_base::wait (this=0x6111a8)
    at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:331
#5  0x00000000004060fb in std::__basic_future<int>::_M_get_result (this=0x7fffc451daa0)
    at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:601
#6  0x0000000000405488 in std::future<int>::get (this=0x7fffc451daa0)
    at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:680
#7  0x00000000004024dc in main (argc=1, argv=0x7fffc451dbb8) at test.cxx:39

and Thread 2 (line 22 is the fn() line):

#0  pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:95
#1  0x00000000004020f6 in __gthread_once (__once=0x611214, __func=0x401e68 <__once_proxy@plt>)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/gthr-default.h:718
#2  0x0000000000404db1 in void std::call_once<void (std::__future_base::_State_base::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&), std::__future_base::_State_base* const, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()> >, std::reference_wrapper<bool> >(std::once_flag&, void (std::__future_base::_State_base::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&), std::__future_base::_State_base* const&&, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()> >&&, std::reference_wrapper<bool>&&) (__once=..., __f=@0x7feb014fdc10)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/mutex:819
#3  0x0000000000404517 in std::__future_base::_State_base::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>, bool) (this=0x6111a8, __res=..., __ignore_failure=false)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:362
#4  0x0000000000407af0 in std::__future_base::_Task_state<int ()()>::_M_run() (this=0x6111a8)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:1271
#5  0x00000000004076cc in std::packaged_task<int ()()>::operator()() (this=0x7fffc451da30)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:1379
#6  0x000000000040745a in std::_Function_handler<void ()(), std::reference_wrapper<std::packaged_task<int ()()> > >::_M_invoke(std::_Any_data const&) (
    __functor=...) at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1956
#7  0x00000000004051f2 in std::function<void ()()>::operator()() const (this=0x611290)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:2311
#8  0x000000000040232f in operator() (__closure=0x611040) at test.cxx:22
#9  0x0000000000403d8e in _M_invoke<> (this=0x611040)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1598
#10 0x0000000000403cdb in operator() (this=0x611040)
    at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1586
#11 0x0000000000403c74 in _M_run (this=0x611028) at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/thread:115
#12 0x00007feb01feae10 in execute_native_thread_routine (__p=Unhandled dwarf expression opcode 0xf3
) at [snip]/gcc-4.7.2/libstdc++-v3/src/c++11/thread.cc:73
#13 0x00007feb018879ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007feb015e569d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Barry
  • 286,269
  • 29
  • 621
  • 977
  • `std::thread thread = std::thread` lolwut – Lightness Races in Orbit Jan 08 '15 at 22:49
  • I can reproduce this quite reliably on the same compiler with `-O0`. – Lightness Races in Orbit Jan 08 '15 at 23:10
  • @LightnessRacesinOrbit Well that's good. Hopefully it's because I'm doing something wrong rather than a compiler bug... – Barry Jan 08 '15 at 23:13
  • I've been playing with it and I don't know what's wrong; sorry. – Lightness Races in Orbit Jan 08 '15 at 23:20
  • Possibly the same issue is discussed here: [race-condition in pthread_once()?](http://stackoverflow.com/questions/10843304/race-condition-in-pthread-once) – Alexey Kukanov Jan 08 '15 at 23:42
  • 1
    @AlexeyKukanov I thought of that the second I saw this post, but I don't see the link. There's no lifetime issues like in that post, right – sehe Jan 08 '15 at 23:43
  • 1
    I can easily reproduce the lockup with g++ 4.7, but the program works fine with g++ 4.8 and g++ 4.9. It seems likely to be a compiler or library issue. – Michael Karcher Jan 09 '15 at 00:01
  • @MichaelKarcher I can reproduce in g++ 4.9.0. – Barry Jan 09 '15 at 00:28
  • I am sorry. I talked about gcc version 4.7.4 (Debian 4.7.4-3), gcc version 4.8.3 (Debian 4.8.3-13) and gcc version 4.9.1 (Debian 4.9.1-19) – Michael Karcher Jan 09 '15 at 00:36
  • And as it might be relevant too, I tested on a Core 2 Duo processor on 64-bit Linux. – Michael Karcher Jan 09 '15 at 00:38
  • Works fine for me. Your program does produce around 100 MB (!) of output that is printed to stdout, maybe the problem is a (temporary) hangup there. Do you notice a difference if you replace `std::endl` with '\n'? – StackedCrooked Jan 09 '15 at 00:40
  • reproduced with gcc 4.9.1 too, now. Just took way longer than with gcc 4.7.4, sorry for the false alert. – Michael Karcher Jan 09 '15 at 00:43
  • Actually, pthread_once blocks as it is convinced that another thread is executing the initialization function *right now*. Obviously it is not. It totally smells like a lifetime issue. – Michael Karcher Jan 09 '15 at 00:52
  • Are you *sure* you see this with 4.9.1? It looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60966 and that should be fixed in 4.9.1 ... and I can't reproduce it with 4.9.1 (but can with 4.9.0) – Jonathan Wakely Jan 09 '15 at 01:02
  • I failed to reproduce it with 4.9.1 in gnome-terminal, but reproduced it in xterm (which provides *way* faster output). I am sure I used 4.9.1 to compile. I am sure the compiler used 4.9.1 headers. But it linked in some other system-provided library (libc or libpthread, nonshared parts) that were compiled with 4.8.3. If I read the bug correctly, only the header file should matter. – Michael Karcher Jan 09 '15 at 01:17
  • @JonathanWakely: _"Interestingly, it always happens in the second thread of that type (same thread function)"_ from comment 10 matches my observations precisely. Nice! However, they don't match the OP's -.- – Lightness Races in Orbit Jan 09 '15 at 01:21
  • I tried taking calling `get_future` inside the lock, copy that future, and call `wait` on that copy, to prevent the race between `packaged_task::operator()` and `packaged_task::get_future()` (both non-const), but this didn't help with gcc-4.7 at least. – Michael Karcher Jan 09 '15 at 01:27
  • Yeah, I tried that too, but I know that in practice there isn't actually a race in GCC's implementation (and I'm trying to get the standard to agree, see http://cplusplus.github.io/LWG/lwg-active.html#2412) – Jonathan Wakely Jan 09 '15 at 01:29
  • I still can't reproduce this with 5.0 or 4.9.1, but can easily with 4.9.0 and 4.8.3, with consistent results on 4-core x86_64, 24-core x86_64 and 152-core ppc64 machines. I'm convinced it's PR 60966. – Jonathan Wakely Jan 09 '15 at 01:31
  • 1
    I just valground the version compiled with gcc 4.7, and it is indeed a lifetime issue: I can get a write to the "once"-Indicator while the task has already been destroyed. Trying to reproduce with 4.9.1 again. – Michael Karcher Jan 09 '15 at 01:34
  • 1
    I reproduced it with gcc 4.9.1 again, with the de-raced source code. Now the main thread hangs on locking the unique lock (in the next iteration) while the worker thread hangs on broadcasting the destroyed condition variable. – Michael Karcher Jan 09 '15 at 01:37
  • http://pastebin.ca/2901154 is a valgrind dump + gdb backtrace of the failed thread. I am sure I don't have headers from any g++ 4.9.0 package installed on my system. – Michael Karcher Jan 09 '15 at 01:47
  • @JonathanWakely actually, it is quite similar to PR 60966, but not identical. In PR 60966, a promise object was used, while in this case, the operator() is called directly on the packaged_task, so the extra reference creation in the promise is bypassed. – Michael Karcher Jan 09 '15 at 02:20
  • @MichaelKarcher, you're right, I need to do a similar change to `packaged_task` to prevent the shared state being destroyed until the call to `operator()` completes. I think I only need to patch the 4.8 and 4.9 branches and the GCC trunk is OK, as I moved the `notify_all()` call so it happens before the mutex is unlocked, ensuring waiters can never see the shared state become ready (and potentially destroy the shared state) while the condition_variable is still in use. – Jonathan Wakely Jan 09 '15 at 12:40
  • 3
    Should now be fixed in GCC svn by https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00520.html – Jonathan Wakely Jan 09 '15 at 15:45

2 Answers2

5

I can't explain why your code was broken, but I did find a way to fix it (storing tasks, not std::functions constructed from tasks):

#include <iostream>
#include <future>
#include <thread>
#include <vector>
#include <functional>
#include <unistd.h>

int main(int argc, char** argv)
{
    // Let's face it - your lock() function was kinda weird.
    std::mutex mtx;

    // I've changed this to a vector of tasks, from a vector
    // of functions. Seems to have done the job. Not sure exactly
    // why but this seems to be the proper way to go.
    std::vector<std::packaged_task<int()>> messages;

    std::atomic<bool> running{true};

    std::thread thread([&]{
        while (running) {
            std::unique_lock<std::mutex> l{mtx};
            std::cout << "[T] locked with " << messages.size() << " messages." << std::endl;
            for (auto& fn: messages) {
                fn();
            }
            messages.clear();
        }
    });

    for (int i = 0; i < 1000000; ++i) {
        std::packaged_task<int()> task([i]{
            std::cout << "[T] returning " << i << std::endl;
            return i;
        });

        // Without grabbing this now, if the thread executed fn()
        // before I do f.get() below, it complained about having
        // no shared state.
        std::future<int> f = task.get_future();

        {
            std::unique_lock<std::mutex> l{mtx};
            messages.emplace_back(std::move(task));
        }

        f.get();
    }

    running = false;
    thread.join();
}

At the very least, if this code also deadlocks, then it hasn't yet for me.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • If you axe the lock function, might as well go for `lock_guard<>` :) – sehe Jan 08 '15 at 23:56
  • I think your code still has the problem, but somehow it appears to work (I think it's UB, see my new answer) – sehe Jan 09 '15 at 00:03
  • `template class packaged_task` is a partial specialization. There's nothing wrong with `std::packaged_task`. – T.C. Jan 09 '15 at 00:18
  • 1
    @LightnessRacesinOrbit Partial specialization matching deduces `R` to be `int` and `ArgTypes` to be an empty pack. – T.C. Jan 09 '15 at 00:22
  • 1
    @LightnessRacesinOrbit that "problem" I reported was a misread. I'm sorry for confusing you all. Feel free to forget about this episode :S – sehe Jan 09 '15 at 00:22
  • @T.C. Right; that makes sense – Lightness Races in Orbit Jan 09 '15 at 00:23
  • Thanks for the answer. If this fixes it, I'd like to know why. For my actual use-case, the packaged tasks don't all return the same type so I couldn't do it this way (hence the `std::function`) – Barry Jan 09 '15 at 00:26
  • Can't help but wonder whether https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56492 inadvertently fixed this. – Lightness Races in Orbit Jan 09 '15 at 00:26
5

It seems that the problem is that you destroy the packaged_task possibly before operator() returns in the worker thread. This is most likely undefined behaviour. The program works fine for me if I re-aquire the mutex in the loop after waiting for the future to return a result. This serializes operator() and the destructor of the packaged_task.

Michael Karcher
  • 3,803
  • 1
  • 14
  • 25
  • Just a side note: I experienced similar problems with a boost::barrier (boost 1.53) on Visual Studio 2010. It is not enough for the main thread to continue running after taking the barrier to be sure that all other threads returned from barrier::wait and destruction of the barrier object is fine. – Michael Karcher Jan 09 '15 at 01:53
  • 2
    N.B. a `pthread_cond_t` is explicitly allowed to be destroyed as soon as you have called `pthread_cond_broadcast` to wake all waiters (even if they don't actually wake until after the [`pthread_cond_destroy`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_destroy.html) happens). The problem in this case is that I unlock the mutex before the `pthread_cond_broadcast`, which leaves a small window where the waiting thread can see the shared state become ready and destroy the task (and its shared state containing the condvar) before I have signalled the condvar. – Jonathan Wakely Jan 09 '15 at 12:42
  • @JonathanWakely Thanks guys! Replacing `return future.get();` with `future.wait(); lock(); return future.get();` fixes it for me - thanks for looking into it and providing the suggestion. I'll update compilers... whenever I can... – Barry Jan 09 '15 at 16:47