0

The behavior is the following: e.g. one server worker with 200 sockets handles 100K echoes per second. Starting another server worker on the same port (with the same number of sockets or double less for each worker, it does not matter), immediately decreases first worker performance to about 50% and just slightly improves the overall per machine performance (each worker serves around 50K echoes per second).

So, performance of 6 cores machine is approximately the same as for 1 core machine.

I've tried different approaches with e.g. having one independent IOCP port for each worker (specifying NumberOfConcurrentThreads to 1 in CreateIoCompletionPort), or trying one shared IOCP port for all workers (NumberOfConcurrentThreads is equal to number of workers), the performance is the same. My workers share zero data so there are no locks, etc.

I hope I'm missing something and its not Windows kernel network scalability problems. I'm using Windows 7 Enterprise x64.

Of course the expectation was for approximately linearly scaling performance.

Does anybody know about practical scalability of IOCP over multiple cores on one machine? What situation to expect when the number of active sockets increases?

Thank you!

a_m
  • 369
  • 1
  • 3
  • 12

1 Answers1

0

The usual approach for non-NUMA systems is to have a single IOCP for all connections and a set of threads (usually tunable in size) that service the IOCP.

You can then tune the number of threads based on the number of CPUs and whether any of the work done by the threads is blocking in nature.

Performance should scale well unless you have some shared resource which all connections must access at which point contention for the shared resource will affect your scalability.

I have some free IOCP code available here and a simple multiple client test which allows you to run thousands of concurrent connections here.

For NUMA systems things can be slightly more complex as, ideally, you want to have a single IOCP, thread pool and buffer allocator per NUMA node to keep memory accesses to the local node.

Len Holgate
  • 21,282
  • 4
  • 45
  • 92
  • Thanks for the response. Since you was working with your project quite a lot, can you please provide any numbers on how your framework scales among number of cores/workers on one machine? And on amount of sockets? Thank you very much. – a_m Apr 11 '12 at 08:43
  • I don't have figures for that to hand. We generally only do performance tests to compare one release to another and for specific custom server development for clients. – Len Holgate Apr 11 '12 at 12:00