4

I have written a network server class that maintains a std::set of network clients. The network clients emit a signal to the network server on disconnect (via boost::bind). When a network client disconnects, the client instance needs to be removed from the Set and eventually deleted. I would think this is a common pattern, but I am having problems that might, or might not, be specific to ASIO.

I've tried to trim down to just the relevant code:

/** NetworkServer.hpp **/
class NetworkServices : private boost::noncopyable
{
public:
    NetworkServices(void);
    ~NetworkServices(void);

private:
    void run();
    void onNetworkClientEvent(NetworkClientEvent&); 

private:
    std::set<boost::shared_ptr<const NetworkClient>> clients;
};


/** NetworkClient.cpp **/
void NetworkServices::run()
{
    running = true;

    boost::asio::io_service::work work(io_service); //keeps service running even if no operations

    // This creates just one thread for the boost::asio async network services
    boost::thread iot(boost::bind(&NetworkServices::run_io_service, this));

    while (running)
    {
        boost::system::error_code err;
        try
        {
            tcp::socket* socket = new tcp::socket(io_service);
            acceptor->accept(*socket, err);

            if (!err)
            {
                NetworkClient* networkClient = new NetworkClient(io_service, boost::shared_ptr<tcp::socket>(socket));
                networkClient->networkClientEventSignal.connect(boost::bind(&NetworkServices::onNetworkClientEvent, this, _1));

                clients.insert(boost::shared_ptr<NetworkClient>(networkClient));

                networkClient->init(); //kicks off 1st asynch_read call
            }
        }
        // etc...

    }
}

void NetworkServices::onNetworkClientEvent(NetworkClientEvent& evt)
{
    switch(evt.getType())
    {
        case NetworkClientEvent::CLIENT_ERROR :
        {
            boost::shared_ptr<const NetworkClient> clientPtr = evt.getClient().getSharedPtr();

            // ------ THIS IS THE MAGIC LINE -----
            // If I keep this, the io_service hangs. If I comment it out,
            // everything works fine (but I never delete the disconnected NetworkClient). 

            // If actually deleted the client here I might expect problems because it is the caller
            // of this method via boost::signal and bind.  However, The clientPtr is a shared ptr, and a
            // reference is being kept in the client itself while signaling, so
            // I would the object is not going to be deleted from the heap here. That seems to be the case.
            // Never-the-less, this line makes all the difference, most likely because it controls whether or not the NetworkClient ever gets deleted.
            clients.erase(clientPtr);

            //I should probably put this socket clean-up in NetworkClient destructor. Regardless by doing this,
            // I would expect the ASIO socket stuff to be adequately cleaned-up after this.

            tcp::socket& socket = clientPtr->getSocket();
            try {
                socket.shutdown(boost::asio::socket_base::shutdown_both);
                socket.close();
            }
            catch(...) {
                CommServerContext::error("Error while shutting down and closing socket.");
            }
            break;

        }
        default :
        {
            break;
        }
    }
}



/** NetworkClient.hpp **/
class NetworkClient : public boost::enable_shared_from_this<NetworkClient>, Client
{
    NetworkClient(boost::asio::io_service& io_service,
                  boost::shared_ptr<tcp::socket> socket);
    virtual ~NetworkClient(void);

    inline boost::shared_ptr<const NetworkClient> getSharedPtr() const
    {
        return shared_from_this();
    };

    boost::signal <void (NetworkClientEvent&)>    networkClientEventSignal;

    void onAsyncReadHeader(const boost::system::error_code& error,
                           size_t bytes_transferred);

};

/** NetworkClient.cpp - onAsyncReadHeader method called from io_service.run()
    thread as result of an async_read operation. Error condition usually 
    result of an unexpected client disconnect.**/
void NetworkClient::onAsyncReadHeader(  const boost::system::error_code& error,
                        size_t bytes_transferred)
{
    if (error)
    {

        //Make sure this instance doesn't get deleted from parent/slot deferencing
        //Alternatively, somehow schedule for future delete?
        boost::shared_ptr<const NetworkClient> clientPtr = getSharedPtr();

        //Signal to service that this client is disconnecting
        NetworkClientEvent evt(*this, NetworkClientEvent::CLIENT_ERROR);
        networkClientEventSignal(evt);

        networkClientEventSignal.disconnect_all_slots();

        return;
    }

I believe it's not safe to delete the client from within the slot handler because the function return would be ... undefined? (Interestingly, it doesn't seem to blow up on me though.) So I've used boost:shared_ptr along with shared_from_this to make sure the client doesn't get deleted until all slots have been signaled. It doesn't seem to really matter though.

I believe this question is not specific to ASIO, but the problem manifests in a peculiar way when using ASIO. I have one thread executing io_service.run(). All ASIO read/write operations are performed asynchronously. Everything works fine with multiple clients connecting/disconnecting UNLESS I delete my client object from the Set per the code above. If I delete my client object, the io_service seemingly deadlocks internally and no further asynchronous operations are performed unless I start another thread. I have try/catches around the io_service.run() call and have not been able to detect any errors.

Questions:

  1. Are there best practices for deleting child objects, that are also signal emitters, from within parent slots?

  2. Any ideas as to why the io_service is hanging when I delete my network client object?

kaliatech
  • 17,579
  • 5
  • 72
  • 84
  • @kaliatech what platform? What version of Asio? If you're on Linux, have you tried valgrind? – Sam Miller Dec 17 '10 at 16:23
  • Win7-x64, boost v1.43 (w/ asio v1.4.5), MSVC++ 2010. The code is meant to be cross-platform, so if I can't figure this out soon, I'll try switching my dev environment over to linux to see if the problem still exists. If so, I'll try using valgrind. I should be able to debug on Windows in MSVC 2010 just as easy, but I haven't been able to pinpoint the internal ASIO action yet, probably due to my own inexperience. – kaliatech Dec 17 '10 at 19:31
  • Are you sure your io_service has work remaining after removing the client from the set? It looks like your io_service::work object goes out of scope. – Sam Miller Dec 19 '10 at 15:23
  • The io_service::work does not go out of scope. The while(running) loop shown above does not exit until the program is shutdown. (It continues to accept new clients.) I have also wrapped my call to io_service.run to detect if and when the io_service.run returns, and it doesn't appear to. Thanks for the ideas though. My guess at this point is that I'm not fully shutting down everything required by ASIO before deleting my client (primarily async handlers), although I thought I was by calling socket.shutdown and socket.close. – kaliatech Dec 19 '10 at 16:12
  • why do you use asnchronous read/write but a synchronous accept? – Sam Miller Dec 19 '10 at 18:07
  • @Sam Miller - It seemed to make my class design cleaner. I think the async_accept is harder to deal with than other async operations due to the need to create a socket object early and then later match it to the correct async_accept handler. It can be done with boost binding, but that seemed complex. No reason other than that though. – kaliatech Dec 20 '10 at 15:01
  • I should clarify that this is mainly just my design preference. I know the ASIO examples put the async_accept handler inside the client objects directly so the socket instance is already bound to the correct client when async handler fires. It just didn't seem right to me to be creating client instances before anything had actually been accepted. I think it's okay to mix certain async/sync operations in ASIO and so I don't think is related to my current issue. – kaliatech Dec 20 '10 at 15:08
  • @kaliatech it is most likely not related, it just strikes me as odd. One of the powerful features of the Asio framework is how well it scales on multicore systems. You will not see these benefits when using an `io_service` per thread in your design. – Sam Miller Dec 20 '10 at 16:25
  • @Sam Miller - I completely agree that this design doesn't make full use of Asio's scalability. However, I wanted to make sure things work as expected with one thread before moving to multiple threads in the io_service. – kaliatech Dec 20 '10 at 18:22
  • @kaliatech have you had a chance to run under valgrind yet? – Sam Miller Dec 20 '10 at 19:07
  • Since you are accessing the clients set from multiple threads, I assume that you are protecting it using a mutex or similar? Did you remove that code from the example, in that case could you add it again? If you are not protecting it using a mutex, you really should. – villintehaspam Dec 26 '10 at 15:37
  • @villintehasspam I am currently testing with only one client connection at a time. I connect the client. Disconnect it. Then reconnect a new client a few seconds later. But you are correct that I will need to add protection code around the Set inserts/deletes for production use. – kaliatech Jan 03 '11 at 23:06
  • 1
    @kaliatech, similarly you might want to consider using boost::signals2 instead of boost::signals since signals2 is threadsafe. Something else that strikes me as odd is that you are constructing your NetworkClientEvent with *this and not this or the shared_ptr you just got. What does the NetworkClientEvent do with it? – villintehaspam Jan 06 '11 at 21:56
  • @villintehasspam Thanks for the pointer to boost::signals2. It does look to be a safer alternative. It also seems like it might help address the first part of my question regarding signal object lifetime. Regarding the NetworkClientEvent *this, you are again correct. It would be better to use a shared pointer. Thanks. – kaliatech Jan 07 '11 at 03:03

2 Answers2

1

you can store weak_ptr's in the set, so shared_ptr will be kept only by asio and released on disconnection automatically. remove corresponding weak_ptr from set in ~Client()

andy_tl
  • 21
  • 2
  • I don't think storing a weak_ptr to the NetworkClient in the Set will help keep it alive longer, and so I don't think it will help. There's also no reason for my NetworkService to reference the NetworkClient after shutting it down, which is the only benefit I think a weak_ptr would provide me. However, your answer did start me thinking things through in a different way. I think you might be correct that ASIO expects my NetworkClient to continue to exist even though I thought ASIO would be done with it. Perhaps I should try deleting my ASIO asynch handler bindings before deleting my client. – kaliatech Jan 03 '11 at 23:35
1

I've finally figured it out, and the short answer is that it was primarily a coding/threading mistake on my part. I determined this by creating a simpler standalone code example, and found that it didn't exhibit the same behavior. I should've done this in the first place and I'm sorry to have wasted anyone's time. To answer my original questions in full though:

1 - Are there best practices for deleting child objects, that are also signal emitters, from within parent slots?

No one has really answered this. I think my suggestion and code example above where I use shared_from_this works fine.

Also, as pointed out by villintehaspam in comments above, it might be better to use boost::signal2 which seemingly has better support for managing signal lifetime.

2 - Any ideas as to why the io_service is hanging when I delete my network client object

My mistake was that the destructor in my NetworkClient was triggering an operation that caused the current thread (and only thread available to handle asych IO operations) to block indefinitely. I didn't realize that was happening. New clients were still able to connect because of how I handle the acceptor in it's own thread independent of the io_service asynch operations. Of course even though I scheduled asynch operations for the new client, they never fired because the one thread I made available to the io_service was still blocked.

Thanks to everyone that took the time to look at this.

kaliatech
  • 17,579
  • 5
  • 72
  • 84