1

I am using the asio library and I am trying to asynchronously connect to a socket with use_future. My code works 90% of the time but segfaults occasionally. Here is the code that I have. I narrowed it down to find that it prints checkpoints 2,3 but not 4.

I tried running with lldb and I get this

Process 63910 stopped
* thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x100072e08)
    frame #0: 0x000000010002648c XYZ`std::__1::enable_if<is_error_code_enum<asio::error::basic_errors>::value, std::__1::error_code&>::type std::__1::error_code::operator=<asio::error::basic_errors>(this=0x0000000100072e08, __e=operation_aborted) at system_error:349:20
   346              error_code&
   347          >::type
   348          operator=(_Ep __e) _NOEXCEPT
-> 349              {*this = make_error_code(__e); return *this;}
   350 
   351      _LIBCPP_INLINE_VISIBILITY
   352      void clear() _NOEXCEPT
Target 0: (XYZ) stopped.

Any ideas why this may segfault or if there is any fault in my logic.

struct Peer {
    std::string address;
    uint16_t    port;
};

class bar
{
    private:
        std::shared_ptr<asio::ip::tcp::endpoint> endpoint;
        std::shared_ptr<asio::io_context> context;
        std::shared_ptr<asio::ip::tcp::socket> socket;
    public:
        bool foo(Peer peer);
};

bool bar::foo(Peer peer) {

            endpoint = std::move(std::make_shared<asio::ip::tcp::endpoint>(asio::ip::make_address(peer.address), peer.port));

            context = std::move(std::make_shared<asio::io_context>());

            socket = std::move(std::make_shared<asio::ip::tcp::socket>(*context));

            std::chrono::milliseconds span(100);
            std::chrono::milliseconds zero_s(0);

            std::cout << "checkpoint 2" << std::endl;

            std::future<void> connect_status = socket->async_connect(*endpoint, asio::use_future);
            context->run_for(span);

            std::cout << "checkpoit 3" << std::endl;

            if (connect_status.wait_for(zero_s) == std::future_status::timeout)
                return false;
            
            std::cout << "checkpoint 4" << std::endl;

            connect_status.get();

            
}

You can also see a version that compiles here: https://compiler-explorer.com/z/4WesMM4nP.

Edit: Thanks to everyone who shared this. I figured out the error in my code. It was a pretty simple mistake. The order of destructing is the reverse of constructing. I repeatedly call bar (which sets the socket and context) In my program execution. However, my bar method has flawed logic. The context is destructed at the move assignment. This leads to errors when I call the destructor for socket since the context it is dependent on is now out of scope. Seems that my pinpointing of the checkpoints was inaccurate I backed traced on lldb to find out that indeed it is the problem. A refined version of my code if you are running to a similar problem is like this

            socket.reset();
            context.reset();
            endpoint.reset();

            endpoint = std::make_shared<asio::ip::tcp::endpoint>(asio::ip::make_address(peer.address), peer.port);
            context = std::make_shared<asio::io_context>();
            socket = std::make_shared<asio::ip::tcp::socket>(*context);
ILutRf7
  • 39
  • 2
  • Does the bar::foo return something in your code and you just forgot to put a return statement here or not? If the function in your code looks the same as pasted here then you have an UB because you do not return a value in function that is supposed to do so. – bielu000 Jan 09 '23 at 12:13
  • Yeah, this is incomplete, there is more work that is done in bar::foo, but it doesn't seem to me that there is an error from there. – ILutRf7 Jan 09 '23 at 12:54
  • If you post code with UB, nobody can reason about it. Please make it self-contained, compiling (https://stackoverflow.com/help/minimal-reproducible-example, http://sscce.org/) – sehe Jan 09 '23 at 13:18

1 Answers1

0

It could be that you are just not handling the exceptions from context->run (e.g. the connection is refused/reset by peer).

I got the hunch because the make_error_code will naturally be a part of constructing the system_error exception.

This should not, in itself, normally segfault. However, if you are running with dynamically linked boost::system and you are not linking against the same version of Boost during build as at runtime, the error categories might not be compatible.

It's also possible that different dynamically loaded modules are linking incompatible versions.

This is a pretty long shot, but I provide the hint just in case it sparks ideas.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you for the insight. It was difficult to make a self-contained, compile code since the program is pretty long. I was able to solve the problem. I forgot a very basic idea: the order of destructing is the reverse of constructing. I repeatedly call bar. In later calls, `context` is destructed at the move assignment. This leads to errors when I call the destructor for `socket` since the context it is referencing is now out of scope. Seems that my pinpointing of the checkpoints was inaccurate I backed traced on lldb to find out that indeed it is the socket move assignment that's the culprit. – ILutRf7 Jan 10 '23 at 02:21