1

I've been investigating an issue where we have a single UDP server sending to multiple clients. The server is sending data out on a multicast channel and port. The clients are running on the same machine and each client opens a socket to the same port as every other client.

We stagger the start of the clients. When we reach a certain number of clients, say 10, we start seeing packet drops. We've eliminated the NIC as the issue using various monitoring tools and the socket buffer size is several times larger than the message size. Our sending interval is quite large (five seconds) and the clients do nothing with the data so the rate of consumption is a non-factor. As the title says we've reproduced the issue on both Windows Server 2008 and Linux (not sure about the version).

Our current theory is that the 10th client puts too much load on the OS which is copying all this data to each socket. The thing is we're only sending 500,000 bytes every five seconds, which doesn't seem like much at all.

Mostly I'm posting here in the hopes that someone has seen a similar problem. I was pointed to this hotfix in my search but it did not solve the issue. Any resources for investigating the details of the OS internals which handle network traffic would be appreciated. Unfortunately I lack this kind of domain knowledge and it has been difficult to find good and detailed reading material on the subject.

user815512
  • 184
  • 1
  • 10
  • 'Only' half a million bytes per 5s is a lot. Multiply that by 10 for the ten clients and you have 5 megabytes being copied into socket receive buffers, and from there into application buffers, which makes 10M per 5s. It adds up. – user207421 Oct 24 '13 at 21:48
  • @EJP may well be rignt. On Windows, maybe use IOCP to remove a layer of copying? – Martin James Oct 25 '13 at 11:06
  • Thanks for your suggestion. We found the bottleneck in a NIC driver setting. – user815512 Nov 17 '13 at 23:34
  • @user815512 what the settings did you changed? – vak Feb 18 '14 at 16:09
  • 1
    The ring buffer size was changed from "Auto" to the max setting. Auto apparently does some magic calculation to determine the correct size, for our use case we just maxed it out. – user815512 Feb 19 '14 at 16:56
  • @user815512 thank you, but unfortunatelly it didn't help in my case – vak Feb 20 '14 at 10:57

0 Answers0