The behavior is the following: e.g. one server worker with 200 sockets handles 100K echoes per second. Starting another server worker on the same port (with the same number of sockets or double less for each worker, it does not matter), immediately decreases first worker performance to about 50% and just slightly improves the overall per machine performance (each worker serves around 50K echoes per second).
So, performance of 6 cores machine is approximately the same as for 1 core machine.
I've tried different approaches with e.g. having one independent IOCP port for each worker (specifying NumberOfConcurrentThreads to 1 in CreateIoCompletionPort), or trying one shared IOCP port for all workers (NumberOfConcurrentThreads is equal to number of workers), the performance is the same. My workers share zero data so there are no locks, etc.
I hope I'm missing something and its not Windows kernel network scalability problems. I'm using Windows 7 Enterprise x64.
Of course the expectation was for approximately linearly scaling performance.
Does anybody know about practical scalability of IOCP over multiple cores on one machine? What situation to expect when the number of active sockets increases?
Thank you!