3

I have made this small illustrative code that exhibits the same issues the program I'm writing does: namely, it works fine in debug mode, segfaults in release. The problem seems to be that the ui_context, in release mode, when being called to run the work it has assigned, is nullptr. Running in Fedora 33, with g++ (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) and clang version 11.0.0 (Fedora 11.0.0-2.fc33) . Both compilers behave in the same way. Boost version is 1.75 .

Code:


#include <iostream>
#include <vector>
#include <memory>
#include <chrono>
#include <thread>

#include <boost/asio.hpp>
#include <boost/signals2.hpp>

constexpr auto MAX_LOOP_COUNT = 100;

class network_client : public std::enable_shared_from_this<network_client>
{
private:
    using Signal = boost::signals2::signal<void(int)>;
public:
    network_client(boost::asio::io_context &context) : 
    strand(boost::asio::make_strand(context))
    {
        std::cout << "network client created" << std::endl;
    }
    void doNetworkWork()
    {
        std::cout << "doing network work" << std::endl;
        boost::asio::post(strand,std::bind(&network_client::onWorkComplete,shared_from_this()));
    }
    void onWorkComplete()
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
        std::cout << "signalling completion" << " from thread id:" << std::this_thread::get_id() << std::endl;
        signal(42);
    }
    void workCompleteHandler(const typename Signal::slot_type &slot)
    {
        signal.connect(slot);
    }

private : 
    boost::asio::strand<boost::asio::io_context::executor_type> strand;
    Signal signal;
};

class network_client_producer
{
public :
    network_client_producer() : work(boost::asio::make_work_guard(context))
    {
        using run_function = boost::asio::io_context::count_type (boost::asio::io_context::*)();        
        for (int i = 0; i < 2; i++)
        {
            context_threads.emplace_back(std::bind(static_cast<run_function>(&boost::asio::io_context::run), std::ref(context)));
        }
    }
    ~network_client_producer()
    {
        context.stop();
        for(auto&& thread : context_threads)
        {
            if(thread.joinable())
            {
                thread.join();
            }
        }
    }
    using NetworkClientPtr = std::shared_ptr<network_client>;
    NetworkClientPtr makeNetworkClient()
    {
        return std::make_shared<network_client>(context);
    }

private : 
    boost::asio::io_context context;
    std::vector<std::thread> context_threads;
    boost::asio::executor_work_guard<boost::asio::io_context::executor_type> work;
};


class desktop : public std::enable_shared_from_this<desktop>
{
public:
    desktop(const boost::asio::io_context::executor_type &executor):executor(executor)
    {
    }
    void doSomeNetworkWork()
    {
        auto client = client_producer.makeNetworkClient();
        client->workCompleteHandler([self = shared_from_this()](int i){
            //post work into the UI thread
            std::cout << "calling into the uiThreadWork with index " << i << " from thread id:" << std::this_thread::get_id() << std::endl;
            boost::asio::post(self->executor, std::bind(&desktop::uiThreadWorkComplete, self, i));
        });
        client->doNetworkWork();
    }
    void showDesktop()
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(20));
    }
public:
    void uiThreadWorkComplete(int i)
    {
        std::cout << "Called in the UI thread with index:" << i << ", on thread id:" << std::this_thread::get_id() << std::endl;
    }
private:
    const boost::asio::io_context::executor_type& executor;
    network_client_producer client_producer;
};

int main()
{
    std::cout << "Starting application. Main thread id:"<<std::this_thread::get_id() << std::endl;
    
    int count = 0;
    boost::asio::io_context ui_context;
    auto work = boost::asio::make_work_guard(ui_context);
    /*auto work = boost::asio::require(ui_context.get_executor(),
                                     boost::asio::execution::outstanding_work.tracked);*/
    auto ui_desktop = std::make_shared<desktop>(ui_context.get_executor());

    ui_desktop->doSomeNetworkWork();

    while(true)
    {
        ui_context.poll_one();

        ui_desktop->showDesktop();

        if (count >= MAX_LOOP_COUNT)
            break;
        count++;
    }
    ui_context.stop();
    std::cout << "Stopping application" << std::endl;
    return 0;
}

Compiling it with g++ -std=c++17 -g -o main -pthread -O3 main.cpp and running it in gdb I get this:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Starting application. Main thread id:140737348183872
[New Thread 0x7ffff7a51640 (LWP 27082)]
[New Thread 0x7ffff7250640 (LWP 27083)]
network client created
doing network work
signalling completion from thread id:140737348179520
calling into the uiThreadWork with index 42 from thread id:140737348179520

Thread 2 "main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7a51640 (LWP 27082)]
0x000000000040b7b8 in boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u>::execute<std::_Bind<void (desktop::*(std::shared_ptr<desktop>, int))(int)> >(std::_Bind<void (desktop::*(std::shared_ptr<desktop>, int))(int)>&&) const (this=<optimized out>, f=...) at /usr/local/include/boost/asio/impl/io_context.hpp:309
309       io_context_->impl_.post_immediate_completion(p.p,


While compiling without any optimizations g++ -std=c++17 -g -o main -pthread -O0 main.cpp works as expected.

I tried to keep it as close as I can to the real program that actually does network IO, which is why I have that strand in there.

It's obvious that I'm doing something horribly wrong here. The question is: what is the problem? Thank you for any pointers.

serje
  • 33
  • 4

2 Answers2

3

Add the sanitizers -fsanitize=undefined,address:

Starting application. Main thread id:139902898299968
network client created
doing network work
signalling completion from thread id:139902399940352
calling into the uiThreadWork with index 42 from thread id:139902399940352
=================================================================
==29084==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc8393d7c0 at pc 0x000000507ce0 bp 0x7f3d90d9e3d0 sp 0x7f3d90d9e3c8
READ of size 8 at 0x7ffc8393d7c0 thread T1
    #0 0x507cdf in boost::asio::io_context::basic_executor_type<std::allocator<void>, ...
    #1 0x507cdf in boost::asio::detail::initiate_post_with_executor<boost::asio::io_co...
    #2 0x507cdf in auto boost::asio::post<boost::asio::io_context::basic_executor_type...
    #3 0x5077cf in desktop::doSomeNetworkWork()::'lambda'(int)::operator()(int) const ...
    #4 0x518ce2 in boost::function1<void, int>::operator()(int) const /home/sehe/custo...
    #5 0x518481 in boost::signals2::detail::void_type boost::signals2::detail::call_wi...
    #6 0x518481 in boost::signals2::detail::void_type boost::signals2::detail::variadi...
    #7 0x517f43 in boost::signals2::detail::slot_call_iterator_t<boost::signals2::deta...
    #8 0x516397 in void boost::signals2::optional_last_value<void>::operator()<boost::...
    #9 0x516397 in void boost::signals2::detail::combiner_invoker<void>::operator()<bo...
    #10 0x516397 in boost::signals2::detail::signal_impl<void (int), boost::signals2::...
    #11 0x50d9d4 in network_client::onWorkComplete() /home/sehe/Projects/stackoverflow...
    #12 0x51021d in void std::_Bind<void (network_client::* (std::shared_ptr<network_c...
    #13 0x51021d in void boost::asio::asio_handler_invoke<std::_Bind<void (network_cli...
    #14 0x51021d in void boost_asio_handler_invoke_helpers::invoke<std::_Bind<void (ne...
    #15 0x51021d in boost::asio::detail::executor_op<std::_Bind<void (network_client::...
    #16 0x51188e in boost::asio::detail::strand_executor_service::invoker<boost::asio:...
    #17 0x514311 in void boost::asio::asio_handler_invoke<boost::asio::detail::strand_...
    #18 0x514311 in void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail...
    #19 0x514311 in boost::asio::detail::executor_op<boost::asio::detail::strand_execu...
    #20 0x4d8704 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::co...
    #21 0x4d70dc in boost::asio::detail::scheduler::run(boost::system::error_code&) /h...
    #22 0x523a6e in boost::asio::io_context::run() /home/sehe/custom/boost_1_75_0/boos...
    #23 0x5258ef in unsigned long std::_Bind<unsigned long (boost::asio::io_context::*...
    #24 0x5258ef in unsigned long std::__invoke_impl<unsigned long, std::_Bind<unsigne...
    #25 0x5258ef in std::__invoke_result<std::_Bind<unsigned long (boost::asio::io_con...
    #26 0x5258ef in unsigned long std::thread::_Invoker<std::tuple<std::_Bind<unsigned...
    #27 0x7f3da660bd7f  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd0d7f)
    #28 0x7f3da5f856da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #29 0x7f3da5a9671e in clone /build/glibc-S7xCS9/glibc-2.27/misc/../sysdeps/unix/sy...

Address 0x7ffc8393d7c0 is located in stack of thread T0 at offset 224 in frame
    #0 0x4cb30f in main /home/sehe/Projects/stackoverflow/test.cpp:109

  This frame has 6 object(s):
    [32, 40) 'ref.tmp.i85' (line 96)
    [64, 80) 'ref.tmp.i'
    [96, 112) 'ui_context' (line 113)
    [128, 152) 'work' (line 114)
    [192, 208) 'ui_desktop' (line 117)
    [224, 240) 'ref.tmp' (line 117) <== Memory access at offset 224 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /home/sehe/custom/boost_1_75_0/boost/asio/io_context.hpp:678:25 in boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u>::basic_executor_type(boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u> const&)
Shadow bytes around the buggy address:
  0x10001071faa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10001071fab0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2
  0x10001071fac0: f8 f2 f2 f2 00 f2 f2 f2 00 00 f3 f3 00 00 00 00
  0x10001071fad0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x10001071fae0: 00 f2 f2 f2 f8 f8 f2 f2 00 00 f2 f2 00 00 00 f2
=>0x10001071faf0: f2 f2 f2 f2 00 00 f2 f2[f8]f8 f3 f3 00 00 00 00
  0x10001071fb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10001071fb10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10001071fb20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10001071fb30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10001071fb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
Thread T1 created by T0 here:
    #0 0x483a6a in pthread_create (/home/sehe/Projects/stackoverflow/sotest+0x483a6a)
    #1 0x7f3da660c014 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std...
    #2 0x5241d4 in void std::vector<std::thread, std::allocator<std::thread> >::_M_realloc_ins...
    #3 0x523517 in std::thread& std::vector<std::thread, std::allocator<std::thread> >::emplac...

==29084==ABORTING

There's your culprit.

Searching The Culprit

  This frame has 6 object(s):
    [32, 40) 'ref.tmp.i85' (line 96)
    [64, 80) 'ref.tmp.i'
    [96, 112) 'ui_context' (line 113)
    [128, 152) 'work' (line 114)
    [192, 208) 'ui_desktop' (line 117)
    [224, 240) 'ref.tmp' (line 117) <== Memory access at offset 224 is inside this variable

What variable is that? Apparently in the line

auto ui_desktop = std::make_shared<desktop>(ui_context.get_executor());

there's a temporary that is being kept a reference to. It must be ui_context.get_executor() because ui_desktop is named and has "obvious" lifetime.

Sure enough, desktop declares its executor member by reference:

const boost::asio::io_context::executor_type& executor;

This is a clear error. Executors are not services nor execution contexts, and are designed to be cheaply copyable and passed by value. The fix is trivial:

boost::asio::io_context::executor_type executor;

BONUS

As a bonus, here's a simplified version that runs the demo for half a second. Notes:

  • using a thread_pool instead of hand-rolling a flawed one
  • consider not using .stop() on execution contexts, or forgetting about the redundant word guards?

Live On Compiler Explorer

#include <iostream>
#include <chrono>
#include <iomanip>
#include <memory>
#include <thread>

#include <boost/asio.hpp>
#include <boost/signals2.hpp>

namespace {
    using namespace std::chrono_literals;
    auto now = std::chrono::high_resolution_clock::now;
    auto elapsed = [start=now()] { return (now()-start)/1ms; };

    inline std::string thread_hash() {
        static constexpr std::hash<std::thread::id> h{};
        std::ostringstream oss;
        oss << std::hex << std::setw(2) << std::setfill('0')
            << h(std::this_thread::get_id()) % 0xff;
        return oss.str();
    }

    auto trace = [](auto const&... args) {
        std::cout << "thread #" << thread_hash() << " at t+" << std::setw(3) << elapsed() << "ms\t";
        (std::cout << ... << args) << std::endl;
    };
} // namespace

struct network_client : std::enable_shared_from_this<network_client> {
    explicit network_client(const boost::asio::any_io_executor& context) : strand(make_strand(context)) {
        trace("network client created");
    }

    void doNetworkWork() {
        trace("doing network work");
        post(strand, std::bind(&network_client::onWorkComplete, shared_from_this()));
    }

    void onWorkComplete() {
        std::this_thread::sleep_for(10ms);
        trace("signalling completion");
        signal(42);
    }

    template <typename F> void workCompleteHandler(F slot) {
        signal.connect(std::move(slot));
    }

  private:
    boost::asio::strand<boost::asio::any_io_executor> strand;
    using Signal = boost::signals2::signal<void(int)>;
    Signal signal;
};

struct network_client_producer {
    auto makeNetworkClient() {
        return std::make_shared<network_client>(context_threads.get_executor());
    }

  private : 
    boost::asio::thread_pool context_threads {2};
};

struct desktop : std::enable_shared_from_this<desktop> {
    explicit desktop(boost::asio::io_context::executor_type executor) : executor(std::move(executor)) {}
    void doSomeNetworkWork() {
        auto client = client_producer.makeNetworkClient();
        client->workCompleteHandler([this, self = shared_from_this()](int i) {
            // post work into the UI thread
            trace("calling into the uiThreadWork with index ", i);
            post(executor, std::bind(&desktop::uiThreadWorkComplete, self, i));
        });
        client->doNetworkWork();
    }

    static void showDesktop() {
        trace("showDesktop");
        std::this_thread::sleep_for(20ms);
    }

    void uiThreadWorkComplete(int i) const {
        trace("Called in the UI thread with index:", i);
    }

  private:
    boost::asio::io_context::executor_type executor;
    network_client_producer client_producer;
};

int main() {
    trace("Starting application. Main thread is #", thread_hash());

    boost::asio::io_context ui_context;
    auto work = boost::asio::make_work_guard(ui_context);
    /*auto work = boost::asio::require(ui_context.get_executor(),
                                     boost::asio::execution::outstanding_work.tracked);*/
    auto ui_desktop = std::make_shared<desktop>(ui_context.get_executor());

    ui_desktop->doSomeNetworkWork();

    for (auto deadline = now() + 0.5s; now() < deadline;) {
        ui_context.poll_one();
        ui_desktop->showDesktop();
    }

    trace("Stopping application");
    work.reset();
    ui_context.run();
    // ui_context.stop();
    trace("Bye\n");
}

Prints

thread #97 at t+  0ms   Starting application. Main thread is #97
thread #97 at t+  0ms   network client created
thread #97 at t+  1ms   doing network work
thread #97 at t+  1ms   showDesktop
thread #2d at t+ 11ms   signalling completion
thread #2d at t+ 11ms   calling into the uiThreadWork with index 42
thread #97 at t+ 21ms   Called in the UI thread with index:42
thread #97 at t+ 21ms   showDesktop
thread #97 at t+ 41ms   showDesktop
thread #97 at t+ 61ms   showDesktop
thread #97 at t+ 81ms   showDesktop
thread #97 at t+101ms   showDesktop
thread #97 at t+122ms   showDesktop
thread #97 at t+142ms   showDesktop
thread #97 at t+162ms   showDesktop
thread #97 at t+182ms   showDesktop
thread #97 at t+202ms   showDesktop
thread #97 at t+222ms   showDesktop
thread #97 at t+242ms   showDesktop
thread #97 at t+262ms   showDesktop
thread #97 at t+282ms   showDesktop
thread #97 at t+302ms   showDesktop
thread #97 at t+323ms   showDesktop
thread #97 at t+343ms   showDesktop
thread #97 at t+363ms   showDesktop
thread #97 at t+383ms   showDesktop
thread #97 at t+403ms   showDesktop
thread #97 at t+423ms   showDesktop
thread #97 at t+443ms   showDesktop
thread #97 at t+463ms   showDesktop
thread #97 at t+483ms   showDesktop
thread #97 at t+503ms   Stopping application
thread #97 at t+504ms   Bye
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added a demo **[Live On Compiler Explorer](https://godbolt.org/z/187a6z)** that includes some suggestions to simplify – sehe Jan 02 '21 at 01:45
  • 1
    Thank you, very educational. _Executors are not services nor execution contexts, and are designed to be cheaply copyable and passed by value._ That I didn't know. Thanks for enlightening me. – serje Jan 02 '21 at 01:56
1

The problem is that your executor is a reference to a temporary object. In your main method, you call ui_context.get_executor(), which returns a temporary object. You pass the temporary to the desktop constructor, which stores a reference to this object in the member variable executor. After the auto ui_desktop = ... line in main has completed, the temporary goes out-of-scope and the reference held by executor becomes invalid.

This problem is also detected when compiling your program with address sanitization enabled (-fsanitize=address):

==24629==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffe8bee0270 at pc 0x5584b2d89b0c bp 0x7f3ac42fd970 sp 0x7f3ac42fd960
READ of size 8 at 0x7ffe8bee0270 thread T1
    #0 0x5584b2d89b0b in boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u>::basic_executor_type(boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u> const&) /usr/include/boost/asio/io_context.hpp:678
...

I would suspect that in your debug build, the temporary object gets to live slightly longer, i.e. the stack memory that was occupied by the temporary object is not reclaimed immediately after the temporary goes out-of-scope. Whereas in the release build, more aggressive optimizations are applied, which results in the memory being reclaimed sooner, thereby invalidating the reference sooner, and then crashing your program once the reference is accessed.

To fix this, you have to ensure that the executor returned by get_executor does not go out-of-scope, so that the reference held by the ui_desktop object remains valid. For example, you could assign the result of get_executor to a variable in your main:

  auto executor{ui_context.get_executor()};
  auto ui_desktop = std::make_shared<desktop>(executor);
  • Wow, yes, you're right. I don't know why I thought that `context.get_executor()` would give me an executor that lives for as long as the context does. I honestly thought it was part of the context itself. But this makes sense now. – serje Jan 02 '21 at 01:16