How to detect WinSock TCP timeout with BindIoCompletionCallback

Question

I am building a Visual C++ WinSock TCP server using BindIoCompletionCallback, it works fine receiving and sending data, but I can't find a good way to detect timeout: SetSockOpt/SO_RCVTIMEO/SO_SNDTIMEO has no effect on nonblocking sockets, if the peer is not sending any data, the CompletionRoutine is not called at all.

I am thinking about using RegisterWaitForSingleObject with the hEvent field of OVERLAPPED, that might work but then CompletionRoutine is not needed at all, am I still using IOCP ? is there a performance concern if I use only RegisterWaitForSingleObject and not using BindIoCompletionCallback ?

Update: Code Sample:

My first try:

    bool CServer::Startup() {
        SOCKET ServerSocket = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, WSA_FLAG_OVERLAPPED);
        WSAEVENT ServerEvent = WSACreateEvent();
        WSAEventSelect(ServerSocket, ServerEvent, FD_ACCEPT);
        ......
        bind(ServerSocket......);
        listen(ServerSocket......);
        _beginthread(ListeningThread, 128 * 1024, (void*) this);
        ......
        ......
    }

    void __cdecl CServer::ListeningThread( void* param ) // static
    {
        CServer* server = (CServer*) param;
        while (true) {
            if (WSAWaitForMultipleEvents(1, &server->ServerEvent, FALSE, 100, FALSE) == WSA_WAIT_EVENT_0) {
                WSANETWORKEVENTS events = {};
                if (WSAEnumNetworkEvents(server->ServerSocket, server->ServerEvent, &events) != SOCKET_ERROR) {
                    if ((events.lNetworkEvents & FD_ACCEPT) && (events.iErrorCode[FD_ACCEPT_BIT] == 0)) {
                        SOCKET socket = accept(server->ServerSocket, NULL, NULL);
                        if (socket != SOCKET_ERROR) {
                            BindIoCompletionCallback((HANDLE) socket, CompletionRoutine, 0);
                            ......
                        }
                    }
                }
            }
        }
    }

    VOID CALLBACK CServer::CompletionRoutine( __in DWORD dwErrorCode, __in DWORD dwNumberOfBytesTransfered, __in LPOVERLAPPED lpOverlapped ) // static
    {
        ......
        BOOL res = GetOverlappedResult(......, TRUE);
        ......
    }

    class CIoOperation {
    public:
        OVERLAPPED Overlapped;
        ......
        ......
    };

    bool CServer::Receive(SOCKET socket, PBYTE buffer, DWORD length, void* context)
    {
        if (connection != NULL) {
            CIoOperation* io = new CIoOperation();
            WSABUF buf = {length, (PCHAR) buffer}; 
            DWORD flags = 0;
            if ((WSARecv(Socket, &buf, 1, NULL, &flags, &io->Overlapped, NULL) != 0) && (GetLastError() != WSA_IO_PENDING)) {
                delete io;
                return false;
            } else return true;
        }
        return false;
    }

As I said, it works fine if the client is actually sending data to me, 'Receive' is not blocking, CompletionRoutine got called, data received, but here is one gotcha, if the client is not sending any data to me, how can I give up after a timeout ?

Since SetSockOpt/SO_RCVTIMEO/SO_SNDTIMEO wont help here, I think I should use the hEvent field in the OVERLAPPED stucture which will be signaled when the IO completes, but a WaitForSingleObject / WSAWaitForMultipleEvents on that will block the Receive call, and I want the Receive to always return immediately, so I used RegisterWaitForSingleObject and WAITORTIMERCALLBACK. it worked, the callback got called after the timeout, or, the IO completes, but now I have two callbacks for any single IO operation, the CompletionRoutine, and the WaitOrTimerCallback:

if the IO completed, they will be called simutaneously, if the IO is not completed, WaitOrTimerCallback will be called, then I call CancelIoEx, this caused the CompletionRoutine to be called with some ABORTED error, but here is a race condition, maybe the IO will be completed right before I cancel it, then ... blahblah, all in all its quite complicated.

Then I realized I dont actually need BindIoCompletionCallback and CompletionRoutine at all, and do everything from the WaitOrTimerCallback, it may work, but here is the interesting question, I wanted to build an IOCP-based Winsock server in the first place, and thought BindIoCompletionCallback is the easiest way to do that, using the threadpool provied by Windows itself, now I endup with a server without IOCP code at all ? is it still IOCP ? or should I forget BindIoCompletionCallback and build my own IOCP threadpool implementation ? why ?

What language are you working in? Can you provide a limited code sample? — M.Babcock, Dec 31 '11 at 06:57
You might want to look at this code: http://www.codeproject.com/KB/IP/iocp_server_client.aspx?msg=1133926 My old copies of Jeffrey Richter's "Programming Server Side Applications for Microsoft Windows 2000" is elsewhere at the moment, so I can't give you any more help :( — paulsm4, Dec 31 '11 at 06:58

Martin James · Answer 1 · 2012-01-05T17:55:58.050

What I did was to force the timeout/completion notifications to enter a critical section in the socket object. Once in, the winner can set a socket state variable and perform its action, whatever that might be. If the I/O completion gets in first, the I/O buffer array is processed in the normal way and any timeout is directed to restart by the state-machine. Similarly if the timeout gets in first, the I/O gets CancelIOEx'd and any later queued completion notification is discarded by the state-engine. Because of these possible 'late' notifications, I put released sockets onto a timeout queue and only recycle them onto the socket object pool after five minutes, in a similar way to how the TCP stack itself puts its sockets into 'TIME_WAIT'.

To do the timeouts, I have one thread that operates on FIFO delta-queues of timing-out objects, one queue for each timeout limit. The thread waits on an input queue for new objects with a timeout calculated from the smallest timeout-expiry-time of the objects at the head of the queues.

There were only a few timeouts used in the server, so I used queues fixed at compile-time. It would be fairly easy to add new queues or modify the timeout by sending appropriate 'command' messages to the thread input queue, mixed-in with the new sockets, but I didn't get that far.

Upon timeout, the thread called an event in the object which, in case of a socket, would enter the socket object CS-protected state-machine, (these was a TimeoutObject class which the socket descended from, amongst other things).

More:

I wait on the semaphore that controls the timeout thread input queue. If it's signaled, I get the new TimeoutObject from the input queue and add it to the end of whatever timeout queue it asks for. If the semaphore wait times out, I check the items at the heads of the timeout FIFO queues and recalculate their remaining interval by sutracting the current time from their timeout time. If the interval is 0 or negative, the timeout event gets called. While iterating the queues and their heads, I keep in a local the minimum remaining interval before the next timeout. Hwn all the head items in all the queues have non-zero remaining interval, I go back to waiting on the queue semaphore using the minimum remaining interval I have accumulated.

The event call returns an enumeration. This enumeration instructs the timeout thread how to handle an object whose event it's just fired. One option is to restart the timeout by recalcuating the timeout-time and pushing the object back onto its timeout queue at the end.

I did not use RegisterWaitForSingleObject() because it needed .NET and my Delphi server was all unmanaged, (I wrote my server a long time ago!).

That, and because, IIRC, it has a limit of 64 handles, like WaitForMultipleObjects(). My server had upwards of 23000 clients timing out. I found the single timeout thread and multiple FIFO queues to be more flexible - any old object could be timed out on it as long as it was descended from TimeoutObject - no extra OS calls/handles needed.

Hmm...so you use a dedicated thread and WaitForSingleObject on some Events sequentially ? isn's that what RegisterWaitForSingleObject is designed for ? — WalkingCat, Jan 05 '12 at 03:13
I only ever waited on one synchro object - the semaphore that held the count for the timeout thread input queue. I edited my answer to add some more details. — Martin James, Jan 05 '12 at 17:59
Hmm... I'll read your update carefully later, but RegisterWaitForSingleObject is Win32 API and not depends on .NET :) — WalkingCat, Jan 09 '12 at 09:25
Hm. If you use RegisterWaitForSingleObject does it create one thread per wait handle? — Tim Lovell-Smith, Jan 04 '14 at 07:47

Aaron Klotz · Answer 2 · 2012-01-10T23:59:12.597

The basic idea is that, since you're using asynchronous I/O with the system thread pool, you shouldn't need to check for timeouts via events because you're not blocking any threads.

The recommended way to check for stale connections is to call getsockopt with the SO_CONNECT_TIME option. This returns the number of seconds that the socket has been connected. I know that's a poll operation, but if you're smart about how and when you query this value, it's actually a pretty good mechanism for managing connections. I explain below how this is done.

Typically I'll call getsockopt in two places: one is during my completion callback (so that I have a timestamp for the last time that an I/O completion occurred on that socket), and one is in my accept thread.

The accept thread monitors my socket backlog via WSAEventSelect and the FD_ACCEPT parameter. This means that the accept thread only executes when Windows determines that there are incoming connections that require accepting. At this time I enumerate my accepted sockets and query SO_CONNECT_TIME again for each socket. I subtract the timestamp of the connection's last I/O completion from this value, and if the difference is above a specified threshold my code deems the connection as having timed out.

Yes you're not blocking any threads, but the socket would still using system resources until you notice the time lag, right? Is that really good enough to prevent you running out of connections? — Tim Lovell-Smith, Jan 04 '14 at 07:41

How to detect WinSock TCP timeout with BindIoCompletionCallback

2 Answers2