0

I have a question related to IOCP networking strategy for our C++ server application.

Our server simulates a lot of devices which speak over UDP with short (less than 2K) messages. The server is also bound by a soft real-time constraint of 70-100 milliseconds. Currently the networking part of the application was developed with a thread being started for every socket, which leads to hundreds of threads being started. Their job is to watch for the UDP sockets, and when the data arrives, copy it into the queue of our real-time thread.

We are being asked to support more and more devices and I was thinking that rewriting the communication module using IOCP our server would be more efficient. I developed a prototype based on the code I was able to find online, but the combination of

  • WSARecvFrom (Initiates receive)
  • GetQueuedCompletionStatus
  • OnDataRecieved (A method of my class that gets called when data is copied into my buffer)

does not seem efficient at all. The gaps between data arrival on a given socket are 500-600 milliseconds.

I only started prototyping and did not profile a whole lot. My question are:

Can IOCP be used for my scenario or is it designed for high throughput only? Will WSAAsyncSelect (with hidden windows) be more efficient for my use case?

Thanks in advance, Michael

Edit:

I noticed while profiling that the problem starts with: - WSASendTo - GetQueuedCompletionStatus - OnDataSent

Looks like GetQueuedCompletionStatus doesn't wake up fast enough.

MUXCAH
  • 205
  • 1
  • 6
  • 1
    How many threads do you have servicing the IOCP queue? Ideally you should have one per CPU. If you are targeting Windows 8+, you might consider using Winsock's [Registered I/O extensions](https://technet.microsoft.com/en-us/library/hh997032.aspx) instead: "new Windows Sockets functions added to support Winsock high-speed networking". See [New techniques to develop low-latency network apps](https://channel9.msdn.com/events/BUILD/BUILD2011/SAC-593T) – Remy Lebeau Jun 17 '17 at 05:03
  • I start 2 x num of CPU like some tutorials recommend. – MUXCAH Jun 17 '17 at 05:06
  • 1
    use *IOCP* the best choice. `does not seem efficient at all.` - this say only that in your code exist some errors. `Looks like GetQueuedCompletionStatus doesn't wake up fast enough.` - this is absolute false. error in your code – RbMm Jun 17 '17 at 07:46
  • `WSAAsyncSelect` absolute more worst compare `GetQueuedCompletionStatus`. need understand how this work internally. when any i/o operation with socket complete i/o manager fire event (if we pass it as input for i/o operation) or insert apc to thread, which begin this i/o (if we set apc routine) or post insert packet (IRP) to IOCP if we bind socket to some IOCP. the last variant with IOCP most efficient. `WSAAsyncSelect` (with hidden windows) only add else level of indirection - internal wait for i/o complete and post window messge to your window - worst variant from all. – RbMm Jun 17 '17 at 09:56
  • RIO have advantage - it used pre-mapped buffers, but not map/unmap it to kernel (with MDL) for every single i/o. but notification about i/o complete also based on event or iocp – RbMm Jun 17 '17 at 09:57
  • Thanks. I will start profiling the code today. – MUXCAH Jun 17 '17 at 12:50
  • iocp only way notify user mode when i/o operation is complete (another 2 native ways is event signaling(but here you have no context) and APC (binded to thread)). so problem definitelly not in `GetQueuedCompletionStatus` (it dequeue packet from IOCP fast as possible) but in something else (wrong program logic ?) – RbMm Jun 17 '17 at 13:12
  • Is there any chance you can have all your UDP networking on the same port on the server? That way you only need 1 socket and possibly only a single thread. And the general approach to scale up a UDP socket service is to just have N additional threads doing `recvfrom` on the same socket rather than trying a complicated epoll/iocp design. Where N is 1-3x the number of cpu cores. This is all moot if each client requires a unique port. – selbie Jun 18 '17 at 16:18
  • selbie: Thanks for your response. The application works in two modes. In local mode we have all these listening on 127.0.0.1 but different ports. In network mode - we create a bunch of IPs and accept data that way. We simulate a lot of devices so each has either its own port (in local mode) or its own IP (in network mode). – MUXCAH Jun 18 '17 at 18:38

0 Answers0