10

Our application is reading data very fast over TCP/IP sockets in Java. We are using the NIO library with a non-blocking Sockets and a Selector to indicate readiness to read. On average, the overall processing times for reading and handling the read data is sub-millisecond. However we frequently see spikes of 10-20 milliseconds. (running on Linux).

Using tcpdump we can see the time difference between tcpdump's reading of 2 discreet messages, and compare that with our applications time. We see tcpdump seems to have no delay, whereas the application can show 20 milliseconds.

We are pretty sure this is not GC, because the GC log shows virtually no Full GC, and in JDK 6 (from what I understand) the default GC is parallel, so it should not be pausing the application threads (unless doing Full GC).

It looks almost as if there is some delay for Java's Selector.select(0) method to return the readiness to read, because at the TCP layer, the data is already available to be read (and tcpdump is reading it).

Additional Info: at peak load we are processing about 6,000 x 150 bytes avg per message, or about 900 MB per second.

Sam Goldberg
  • 6,711
  • 8
  • 52
  • 85
  • As @Jim Lewis said, there is likely to be some time-loss to context-switching, and you cannot control for how Java implements NIO internally. It is entirely possible that the JVM adds some overhead that you will not be able to eliminate. That said, without seeing more data, I can't really offer a solution. – Chris K Mar 14 '11 at 18:54
  • 2
    Well - I cleaned up my unaccepted answers. I do not want anyone to think that I don't value the time they took to answer the question. – Sam Goldberg Mar 14 '11 at 22:25
  • I might help to give some details on jvm, kernel/distro, hardware – Matt Mar 14 '11 at 22:41
  • @Matt: O/S=Linux Red Hat Enterprise 5.4, Kernel version=2.6, JVM=Java(TM) SE Runtime Environment (build 1.6.0_06-b02) - Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode), NIC: NIC: 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20), Network speed=1 GBs Full Duplex. – Sam Goldberg Mar 14 '11 at 22:59
  • @Sam: that info should go in your question. – Chris K Mar 14 '11 at 23:26

4 Answers4

4

eden collection still incurs a STW pause so 20ms may be perfectly normal depending on allocation behaviour & heap size/size of the live set.

Matt
  • 8,367
  • 4
  • 31
  • 61
  • After a lot more testing, profiling, etc. We've concluded that GC, even minor GC using ParallelGC seems to stop everything. Pauses range from 2 ms to 20 ms. Making the code more efficient could reduce the number of GC cycles, and maybe even GC time. So this affects a latency of socket communications, and it seems there is nothing that can be done. We did test RTLinux, and so no major improvement. We started investigating Realtime Java, but didn't think it was the best avenue (cost-wise, and complexity-wise). – Sam Goldberg Apr 27 '11 at 21:18
  • MOST of the CMS is in parallel. There is only a very small STW part.. which seems to be the 20ms you see (you could tie it back to GC logs). If you want predictable pauses, you can look into G1 collector, but you will have more overall pauses than the CMS. – bwawok May 24 '11 at 15:11
3

Is your Java code running under RTLinux, or some other distro with hard real-time scheduling capability? If not, 10-20 msec of jitter in the processing times seems completely reasonable, and expected.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
  • I wouldn't expect 10-20ms jitter for any modern box that isn't seriously overloaded, even a few 00us is a lot. – Matt Mar 14 '11 at 21:31
  • @Matt: As I understand it, 10 ms is a typical value for the length of a timeslice in a Linux/x86 non-real-time scheduler. So if the select() call yields the CPU, it could easily take that long for that job to get scheduled again. – Jim Lewis Mar 14 '11 at 21:50
  • I'm trying to understand your comment - we are running Red Hat Enterprise 5.4. 2 CPUs. The machine is mainly occupied running the Java app, and MySQL. Turning off database updating or other processes on the server seems to have no impact on the latency spikes. Do you think potentially we can address this by switching to a RTLinux distribution. – Sam Goldberg Mar 14 '11 at 22:33
  • afaik the current linux scheduler (CFS) is not time sliced, some detail [here](http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/#internals) so I don't believe you can assume 100Hz interrupts means wakeup on 10ms boundaries. It might mean 10ms resolution to certain timestamps mind you. – Matt Mar 14 '11 at 22:39
  • 1
    @Sam: It's possible that switching to RTLinux would give you more control over this latency, yes. Hard to say for sure, but your 10-20 msec latency sure sounded like scheduling jitter to me...but I'm not familiar with more modern stuff like CFS. There may be opportunities to reduce the latency by tuning the scheduling parameters and process priorities, without resorting to a full-blown RT operating system. Also, Matt's remark about the timestamp resolution is worth considering. – Jim Lewis Mar 14 '11 at 22:53
2

I had the same problem in a java service that I work on. When sending the same request repeatedly from the client the server would block at the same spot in stream for 25-35ms. Turning off Nagle's algorithm in the socket fixed this for me. This can be accomplished by calling setTcpNoDelay(true) on the Socket. This may result in increased network congestion because ACKs will now be sent as separate packets. See http://en.wikipedia.org/wiki/Nagle%27s_algorithm for more info on Nagle's algorithm.

1

From the tcpdump faq:

WHEN IS A PACKET TIME-STAMPED? HOW ACCURATE ARE THE TIME STAMPS?

In most OSes on which tcpdump and libpcap run, the packet is time stamped as part of the process of the network interface's device driver, or the networking stack, handling it. This means that the packet is not time stamped at the instant that it arrives at the network interface; after the packet arrives at the network interface, there will be a delay until an interrupt is delivered or the network interface is polled (i.e., the network interface might not interrupt the host immediately - the driver may be set up to poll the interface if network traffic is heavy, to reduce the number of interrupts and process more packets per interrupt), and there will be a further delay between the point at which the interrupt starts being processed and the time stamp is generated.

So odds are, the timestamp is made in the privileged kernel layer, and the lost 20ms is to context-switching overhead back to user-space and into Java and the JVMs network selector logic. Without more analysis of the system as a whole I don't think it's possible to make an affirmative selection of cause.

Chris K
  • 11,996
  • 7
  • 37
  • 65
  • Was talking about your response with other guys in our office. They pointed out that the spikes we are seeing are up to 40 ms. difference from tcpdump. That seems to be too large of a difference to be explained by the above. It's really the inconsistent "spiking" behavior we're trying to fix. – Sam Goldberg Mar 14 '11 at 22:35
  • 1
    Are you certain you've eliminated GC operations as a cause? What percentage of requests result in a "spike?" – Chris K Mar 14 '11 at 23:46