I am trying to tune a high message traffic system running on Solaris. The architecture is a large number (600) of clients which connect via TCP to a big Solaris server and then send/receive relatively small messages (.5 to 1K payload) at high rates. The goal is to minimize the latency of each message processed. I suspect that the TCP stack of the server is getting overwhelmed by all the traffic. What are some commands/metrics that I can use to confirm this, and in case this is true, what is the best way to alleviate this bottleneck?
2 Answers
snoop(1m)
and dtrace(1m)
will probably be your best friends. First one for watching and timing the traffic, second one for measuring server's internal latency.

- 1,500
- 9
- 12

- 131
- 2
-
+1 for dtrace, it's amazing. You need to be careful though, since asking for too much data could hog resources and cause problems. Any dtrace work should be done on a non-production system first so you don't inadvertently bring down your production server. Not that I'm speaking from experience or anything.... – Milner Jul 08 '10 at 11:40
If your total message size is <1k, you may significantly ease your network load by using UDP instead of TCP. Sending a short message via TCP entails overhead at each end setting up the session/connection information, the three-way handshake to start and connectin close to end on the network.
With UDP, you'll send a single packet with your message, and rely on the network "best effort". You'll probably want to code the client with some timeout and retransmit should the packet be lost, but if your transaction and response are both under the MTU, you can get by with less than a quarter of the actual packets.

- 1,512
- 9
- 9