Winsock IOCP Weird Behaviour On Disconnect Flood

Question

I'm programming a Socket/Client/Server library for C#, since I do a lot of cross-platform programming, and I didn't find mono/dotnet/dotnet core enough efficient in high-performance socket handling.

Sice linux epoll unarguably won the performance and usability "fight", I decided to use an epoll-like interface as common API, so I'm trying to emulate it on Windows environment (windows socket performance is not that important for me than linux, but the interface is). To achieve this I use the Winsock2 and Kernel32 API's directly with Marshaling and I use IOCP.

Almost everything works fine, except one: When I create a TCP server with winsock, and connect to it (from local or from a remote machine via LAN, does not matter) with more than 10000 connections, all connections are accepted, no problem at all, when all connections send data (flood) from the client side to the server, no problem at all, server receives all packets, but when I disconnect all clients at the same time, the server does not recognize all disconnection events (i.e. 0 byte read/available), usually 1500-8000 clients stuck. The completion event does not get triggered, therefore I can not detect the connection loss.

The server does not crash, it continues accept new connections, and everything works as expected, only the lost connections do not get recognized.

I've read that - because using overlapped IO needs pre-allocated read buffer - IOCP on reading locks these buffers and releases the locks on completion, and if too many events happen in the same time it can not lock all affected buffers because of an OS limit, and this causes IOCP hang for indefinite time.

I've read that the solution to this buffer-lock problem is I should use a zero-sized buffer with null-pointer to the buffer itself, so the read event will not lock it, and I should use real buffer only when I read real data.

I've implemented the above workaround and it works, except the original problem, after disconnecting many-thousands of clients in the same time, a few-thousand stuck.

Of course I keep up the possibility my code is wrong, so I made a basic server with dotnet's built in SocketAsyncEventArgs class (as the official example describes), that basically does the same using IOCP, and the results are the same.

Everything works fine, except the thousands of client disconnecting in the same time, a few-thousand of disconnection (read on disconnect) events does not get recognized.

I know I should do IO operation and check the return value if the socket is still can perform the IO, and if not, then disconnect it. The problem is in some cases I have nothing to tell the socket, I just receive data, or if I do it periodically this would be almost the same as polling, and would cause high load with thousands of connections, wasted CPU work.

(I use closing the clients numerous closing methods, from gaceful disconnection to proper TCP Socket closing, both on windows and linux clients, results are always the same)

My questions:

- Is there any known solution to this problem?
- Is there any efficient way to recognize TCP (graceful) connection closing by remote?
- Can I somehow set a read-timeout to overlapped socket read?

Any help would be appreciated, thank You!

*IOCP on reading locks these buffers and releases the locks on completion* - iocp here unrelated at all. this is only way how is io completion signaled. how you detect disconnect ? are you have read request active on socket active all time (issue just after connect and reissue after previous complete without error) and detect when read fail ? are you call `DisconnectEx` yourself after read/write error ? — RbMm, Oct 15 '18 at 14:40
*read-timeout to overlapped socket read?* think only by remember time of last read and periodically check this yourself. say by timer - walk by socket list — RbMm, Oct 15 '18 at 14:48
Ok true, I said it wrong, the buffer locking is not an IOCP thing, but what I meant to say, using asynchronous socket with overlapped io combined with iocp needs a buffer that can be locked, and it may cause a deadlock-like situation. I'm always having a continous read with null buffer, and if the socket signals pending data then I read it, if I provide a non-null buffer and the read bytes count is zero, then I take it as a disconnect. I always call GetLastError() from Winsock and from Kernel32, if there is any that is related to closed sockets then I close the socket. Some still not signaled. — beatcoder, Oct 15 '18 at 18:53
As I have said above, using timer to poll the sockets eventually for available IO operation or error is not really a nice solution in an event based system. Of course it may work as a workaround, but I rather would find the problem than just simply working it around. If there is a bug in my code it may cause further malfunctions. First I'd like to know if what should I do different to get the closing (read) signal, to properly process it. If I do everything fine, and this is a Winsock/Windows Event/Kernel weirdness, then I may use workaround. Thank You for the answer anyways! — beatcoder, Oct 15 '18 at 18:59
no, buffer locking nothing common with *asynchronous socket with overlapped io*. this is absolute wrong. buffer locking (via mdl) is **always**. from device view all io is always asynchronous — RbMm, Oct 15 '18 at 18:59
*I'm always having a continous read with null buffer, and if the socket signals pending data then I read it* - why not always have continuous read with actual buffer, say several kb ? — RbMm, Oct 15 '18 at 19:01
say i in self implementation always have continous read with on every socket after connect. until first read or send data error. usually disconnect visible leads to read error. after i got error (read or write) i call `DisconnectEx` on socket. also periodically check - if no io activity on endpoint more than some time - disconnect it and if after yet several second disconnect not complete - close. i not have problems with this code (in sense lost endpoint in indeterminate state) on even high loaded server working long time — RbMm, Oct 15 '18 at 19:10
I described in my question why do I use null buffer. It has two benefits: 1) Null buffer does not get locked, even if thousands of IO operations happen simultaneously, therefore lock count limit does not get reached, which can cause deadlock-like hanging, 2) No useless memory is allocated and reserved for such connection that does not communicate, just keeps the connection alive. The second is just a nice-to-have, the point is the first reason. Because all sockets have null buffers I don't get a pile of locks, and still can check if there is any available IO operation, and I can perform it. — beatcoder, Oct 15 '18 at 20:08
i not think that exist problem with lock memory limit. even if you have say 1000 connected sockets at time and every endpoint recv say to 16kb buffer - you have only 16mb. not too many memory. from another side - how about performance ? you do twice as much read request compare really need - first with 0 buffer and that with actual buffer. think this make only negative effect. — RbMm, Oct 15 '18 at 20:17
every io request enough expensive operation, but you twice it. — RbMm, Oct 15 '18 at 20:19
however this not direct related to original problem - no notification on disconnect event. from another side, if network was broken on client - your local machine can even dont know that client go away. so need periodically check state of sockets, when was last io on it, and disconnect yourself, if time limit. i also interesting in such problem - detect died or inactive clients, but found nothing better, than have socket database and check every socket from it say once per second. not found support from system auto disconnect by timeout — RbMm, Oct 15 '18 at 20:30
`SO_RCVTIMEO` apply only to blocking sockets - this is because after recv request - system wait in place via `WaitForSingleObject` for responce. an here and used timeout for wait - used data stored by `SO_RCVTIMEO` for socket or infinite wait. but this wait block thread and here simply user mode wait timeout. for asynchronous io not view such support - set timeout in kernel for IRP operation - cancel it. even if was support for this - mean kernel need allocate timer for every read IRP and cancel it by timeout. also will be expensive. better have single timer and periodically check all — RbMm, Oct 15 '18 at 20:38
Thank You for all the answers, I definitely will try these methods and post the results. It seems the problem does happen (but still happening) rarely if all the sockets do not transmit data, just connect and disconnect, so this might be some buffer flushing problem on any side. I investigate the situation deeper (check all the packets) to find the exact cause of untriggered events. I don't want to make the discussion too long, so if I find anything that works I post as an aswer. — beatcoder, Oct 16 '18 at 12:20
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181974/discussion-between-rbmm-and-beatcoder). — RbMm, Oct 16 '18 at 20:07

Winsock IOCP Weird Behaviour On Disconnect Flood

0 Answers0