3

I'm writing a Java server (java.net.Socket, java.net.ServerSocket, java.io.ObjectOutputStream, java.io.ObjectInputStream) and I know I'm going to have limited bandwidth allocated for it.

I've written a decorator object for my output and input streams so I can count how many bytes go through it for profiling purposes. But this won't give me any indication of the amount of overhead I'm using for the connection.

I don't anticipate it will be much, but I'd like to prepare for it. I'm not going try to optimize it, I just want to know how much it will be for logistical reasons (how much bandwidth must I request, etc.)

I can't be the first person to try to get this information, but I can't seem to find good resources on the overhead of Java Sockets and TCP/IP in general. (Perhaps that's because there's nothing noteworthy to find... If we're on the order of kb per minute, it's really not much of a concern, but I'd still like to know!)

Thanks!

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • Do you have limited bandwidth because the server is on a slow connection or because you are trying to keep the network / server from getting overloaded (or stay under a cap)? – AngerClown Apr 15 '11 at 00:22
  • The primary concern is keeping bandwidth costs to a minimum. – corsiKa Apr 15 '11 at 05:18
  • If you are bandwidth limited, don't use ObjectOutputStream, its very verbose unless you are very careful. (Generally this doesn't matter) An application specific serialization is likely to be several times smaller. – Peter Lawrey Apr 15 '11 at 07:24

2 Answers2

5

This question is challenging to answer with the information we have right now... for instance, what are you calling 'overhead'? Is it only TCP ACK packets, or all packet overhead (for instance ethernet, IP and tcp headers) for anything other than your data payload?

How many connections per minute? What is the average data transfer, per connection? If there are many very short-lived connections, your overhead requirements go up (due to 3-way handshake, and connection close requirements)... you could also have high overhead if the clients don't read much data, but many clients keep the connections open for days at a time.

Honestly, you're 50x better off modeling this in a lab and making some assumptions about hit rate per minute and concurrent clients... that will give you some ballpark numbers. Play around with limiting the bandwidth afforded to the application to the maximum your budget would allow... then start backing off... you can throttle bandwidth by using wanem on a dual-port linux machine.

Getting lab results like this is far better than theoretical calculations.

HTH, \mike (who spends all day testing network gear)

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
  • +1 for lab approach. I have some time before the app goes live (obviously) so I guess I can hold off the profiling until a little later. It's easier to explain to an engineer than it is to a suit, though! "What do you MEAN, you can't profile it?" – corsiKa Apr 14 '11 at 21:28
  • I guess I'm not sure I understand the difference between profiling and this... please help fill in the gaps... what data would you get from profiling that you won't get by instrumenting the application as you have and running lab simulations? – Mike Pennington Apr 14 '11 at 21:31
  • It's not the what, it's the when. Other teams would like to know now the strain of the app. – corsiKa Apr 14 '11 at 21:36
  • Hehe, sometimes you can punt the ball right back at them... ask them to give you the numbers for concurrent clients, and average number of connections per minute. If they're in a position to give you those numbers (i.e. sales / marketing), then model the data on what they said... vary parameters by +/- 30%, then if real-life doesn't match what they told you to model, at least you have some CYA... – Mike Pennington Apr 14 '11 at 21:47
  • It was difficult to choose between the two of these answers. @Tim actually answered the question I was originally asking, but now I realize that this was the more useful of the two. – corsiKa Apr 14 '11 at 21:50
4

TCP overhead varies based on a number of factors, but is typically around 5% at full capacity.

Basically each "packet" has 20 bytes of IP header (and 20 more if IPv6) plus 20-32 bytes of TCP header. Packet sizes vary based on the network devices and conditions, but are often in the neighborhood of 1500 bytes.

This page has some detail: http://sd.wareonearth.com/~phil/net/overhead/

In my opinion you can completely ignore keep-alives, as they are only used when the connection is idle anyway.

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
  • So if my packets only have 100 bytes of information on average, my overhead may be has high as 52%, I see. – corsiKa Apr 14 '11 at 21:37
  • Yes, but keep in mind that you have no control over the packet size. Many small send calls may be combined into one using the "Nagel algorithm" and large sends may be split up arbitrarily either at the sending system or by routers and switches between the source and destination. – Tim Sylvester Apr 14 '11 at 22:11
  • So with my Java socket, if I go `out.flush()` I'm not guaranteed to generate a packet that gets sent over the network? – corsiKa Apr 14 '11 at 22:30
  • A `flush()` only guarantees that data passed to the stream is handed off to the next level down. Internally, the JVM will eventually end up calling some operating system service like the BSD-style `send()` API, or equivalent, and decisions about combining or splitting data into packets happens at an even lower level than that. – Tim Sylvester Apr 14 '11 at 23:39
  • I shouldn't say you have *no* control, however. You can use `java.net.Socket.setTcpNoDelay()` to tell the TCP/IP stack that you want to trade higher overhead for lower latency, and even without that if you send one byte and wait a while (milliseconds) it *will* send a one-byte packet. The `setTcpNoDelay()` option will reduce or eliminate the time that the TCP/IP stack will wait between send calls before sending a packet over the wire, but it's still just a suggestion, the final determination is out of your hands and may vary from one packet to the next. – Tim Sylvester Apr 14 '11 at 23:39