I'm programming a Socket/Client/Server library for C#
, since I do a lot of cross-platform programming, and I didn't find mono
/dotnet
/dotnet core
enough efficient in high-performance socket handling.
Sice linux
epoll
unarguably won the performance and usability "fight", I decided to use an epoll-like interface as common API, so I'm trying to emulate it on Windows
environment (windows socket performance is not that important for me than linux, but the interface is). To achieve this I use the Winsock2
and Kernel32
API's directly with Marshaling and I use IOCP
.
Almost everything works fine, except one: When I create a TCP server with winsock
, and connect to it (from local or from a remote machine via LAN, does not matter) with more than 10000 connections, all connections are accepted, no problem at all, when all connections send data (flood) from the client side to the server, no problem at all, server receives all packets, but when I disconnect all clients at the same time, the server does not recognize all disconnection events (i.e. 0 byte read/available), usually 1500-8000 clients stuck. The completion event does not get triggered, therefore I can not detect the connection loss.
The server does not crash, it continues accept new connections, and everything works as expected, only the lost connections do not get recognized.
I've read that - because using overlapped IO
needs pre-allocated read buffer - IOCP
on reading locks these buffers and releases the locks on completion, and if too many events happen in the same time it can not lock all affected buffers because of an OS limit, and this causes IOCP
hang for indefinite time.
I've read that the solution to this buffer-lock problem is I should use a zero-sized buffer with null-pointer to the buffer itself, so the read event will not lock it, and I should use real buffer only when I read real data.
I've implemented the above workaround and it works, except the original problem, after disconnecting many-thousands of clients in the same time, a few-thousand stuck.
Of course I keep up the possibility my code is wrong, so I made a basic server with dotnet
's built in SocketAsyncEventArgs
class (as the official example describes), that basically does the same using IOCP
, and the results are the same.
Everything works fine, except the thousands of client disconnecting in the same time, a few-thousand of disconnection (read on disconnect) events does not get recognized.
I know I should do IO operation and check the return value if the socket is still can perform the IO, and if not, then disconnect it. The problem is in some cases I have nothing to tell the socket, I just receive data, or if I do it periodically this would be almost the same as polling, and would cause high load with thousands of connections, wasted CPU work.
(I use closing the clients numerous closing methods, from gaceful disconnection to proper TCP Socket closing, both on windows and linux clients, results are always the same)
My questions:
- Is there any known solution to this problem?
- Is there any efficient way to recognize TCP (graceful) connection closing by remote?
- Can I somehow set a read-timeout to overlapped socket read?
Any help would be appreciated, thank You!