3

We have a client and server application currently testing on the same Windows 7 64 bit machine. They're both written in C# and using P/Invoke to call out to the Winsock2 libraries.

The application works fine overall w/o any errors. And the latency for each "hop" over tcp/ip averages about 350 microseconds.

However, on occasion there are very long delays of upwards of 40 to 50ms before receiving packets and then suddenly they will all arrive.

Efforts to diagnose so far:

  1. During these delays receiving data, the server continues to log that it's sending packets. It's set to send test packets every 1 ms which it will do for 15 or 20 and as much as 50 ms sometimes before the client receives any of them.

  2. The tcpdump was used to sniff packets on the loopback adapter and shows that during this lag period, there's traffic from the server port (6488) to the client port (61743) as usual.

  3. The client calls the select() winsock2 call in a loop so logging via a counter prior to the select() call shows that it has the correct file descriptor. And of course this works before and after the delay just fine.

  4. Further logging immediately after the select() call shows that the fd isn't present--meaning that a read on the socket will block. However, during the periods of transmission w/o any delays, the logging shows it works as expected so that select() returns the fd of the socket to do a non-blocking read.

In short, the loopback adaptor seems to hold these packets somewhere for a long while before finally delivering them to the receiving side.

Any further ideas or a solution?

Some thoughts are the it's often claimed that overlapped I/O works better on Windows but that seems to only matter for scalability if you need to listen to more than 64 sockets.

Can it be that switching to overlapped will do the trick? We want to avoid as that will increase the project deadline and budget. This should work with select() just fine.

Also, can it be that the process or thread in Windows that handles the loopback gets context switched or something and, if so, is there a way to configure it to avoid those delays?

Edit: The correct answer was to ensure that the Nagle algorithm was disabled. We thought it was disabled but that's where the bug was found--in our in-house implementation of SetSocketOption() we used GetSocketOption() to verify. So it turns out you must set NoDelay prior to connecting or binding a socket or else it silently fails to have any effect.

Many thanks to Fun Mun Pieng for the correct answer!!!

Wayne
  • 2,959
  • 3
  • 30
  • 48
  • JOOI, why are you P/invoking to Winsock, rather than using the built-in .NET network classes? – Will Dean Mar 16 '11 at 14:08
  • 1
    It's due to performance testing comparisons and dislike of the threading model used by .Net for asynchronous which incurs very costly context switching when threads fall asleep. Instead, I built a "fiber" or "continuation" styled task scheduler based on "tail recursion" that shares threads and never incurs any context switching. The primary thread never sleeps during real-time processing. It has a near-real time timer and other real-time capabilities. It's lightening fast and scales linearly across multiple cores. .Net team is beginning to produce some of this but not all the pieces yet. – Wayne Mar 16 '11 at 22:25

1 Answers1

3

I suspect this may be due to the Nagle algorithm. The following code disables it:

socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
Fun Mun Pieng
  • 6,751
  • 3
  • 28
  • 30
  • Thanks. Still, I already have code to disable Nagle. Perhaps, I need to double check plus verify that the NoDelay is properly set using GetSocketOption. By the way, your line of code won't work exactly as mentioned in the post, I wrote my own tcp/library instead of the one included with .Net. – Wayne Mar 16 '11 at 07:05
  • 2
    Okay, after implementing the GetSocketOption, it turns out that NoDelay was off because experimentation now shows you have to set NoDelay prior to connecting a socket or prior to binding. So you were, in fact, correct in your answer! Thank you!! – Wayne Mar 16 '11 at 13:50