Can TCP fragmentation be eliminated using a Pipe?

Question

TCP network messages can be fragmented. But fragmented messages are difficult to parse, especially when data types longer than a byte are transferred. For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.

Parsing would be much easier if multiple Channels could be recombined on the fly. So I thought of sending all Data through a java.nio.channels.Pipe.

// count total length
int length = 0;
foreach (Buffer buffer: buffers) {
  length += buffer.remaining()
}

// write to pipe
Pipe pipe = Pipe.open();
pipe.sink().write(buffers);

// read back from pipe
ByteBuffer complete = ByteBuffer.allocateDirect(length)
if (pipe.source().read(complete) != length) {
  System.out.println("Fragmented!")
}

But will this be guaranteed to fill up the buffer completely? Or could the Pipe introduce fragmentation again? In other words, will the body of the condition ever be reached?

Charlie · Answer 1 · 2013-12-12T16:59:57.453

TCP fragmentation has little to do with the problem you are experiencing. The TCP stack on the source of the stream is dividing messages that are too large for a single packet into multiple packets and they are arriving and being reassembled possibly out of alignment of the longs you are expecting.

Regardless, you are treating what amounts to a byte array (a ByteBuffer) as an input stream. You are telling the JVM to read 'the rest of what is in the buffer' into a ByteBuffer. Meanwhile, the second half of your long now inside the network buffer. The ByteBuffer you are now trying to read through will never have the rest of that long.

Consider using a Scanner to read longs, it will block until a long can be read.

Scanner scanner= new Scanner(socket.getChannel());
scanner.nextLong();

Also consider using a DataInputStream to read longs, although I can't tell if it blocks until a whole long is read based on the documentation.

DataInputStream dis = new DataInputStream(socket.InputStream);
dis.readLong();

If you have control over the server, consider using flush() to prevent your packets from getting buffered and sent 'fragmented' or an ObjectOutputStream/ObjectInputStream as a more convenient way to do IO.

Isn't the whole point of Channels and Buffers to replace the old `Stream`-API? So falling back to an `InputStream` seems somewhat deprecated to me, especially when I want to use asynchronous channels with a `Selector`. Object serialization classes will add avoidable overhead compared to sending raw basic data types. — XZS, Dec 12 '13 at 01:37
I added a suggestion to use a Scanner which can be used with channels. — Charlie, Dec 12 '13 at 17:00

user207421 · Answer 2 · 2013-12-12T01:12:15.627

0

No. A Pipe is intended to be written by one thread and read by another. There is an internal buffer of only 4k. If you write more than that to it you will stall.

They really aren't much use at all actually, except as a demonstration.

I don't understand this:

For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.

What second buffer? You should be using the same receive buffer for the life of the channel. Make it an attachment to the SelectionKey so you can find it when you need it.

I also don't understand this:

Parsing would be much easier if multiple Channels could be recombined on the fly

Surely you mean multiple buffers, but the basic idea is to only have one buffer in the first place.

edited Dec 12 '13 at 01:12

answered Dec 11 '13 at 22:21

user207421

305,947
44
307
483

Using the same buffer would be the most simple strategy, indeed. But when the message length is unknown until a terminating symbol appears on the stream, I cannot allocate the buffer beforehand and have to fall back allocating a new buffer as soon as the first one is full. – XZS Dec 11 '13 at 23:40
When TCP fragments a message, it may fragment after any byte, also in between a `long` pushed on the stream. This is why `getLong()` could fail with a `BufferUnderflowException` if the `long` was not entirely received. – XZS Dec 11 '13 at 23:41
@XZS You know what the maximum length message is, surely? I don't know what your second comment is supposed to be telling me that I don't already know, except that it proves you can't use two buffers, because the `long` may get split between them. The basic issue is that you have to keep reading, into the same buffer, until you have everything you need, and then parse it. – user207421 Dec 12 '13 at 01:09
No, the length of the message is not known. A null byte is placed on the stream to signal its end. Until this symbol is encountered, more data has to be read. How can I allocate a single buffer big enough to store the whole message when I do not know how long it has to be? – XZS Dec 12 '13 at 01:29

Can TCP fragmentation be eliminated using a Pipe?

2 Answers2