Read bytes from Java NIO socketchannel until marker is reached

Question

I´m searching for an efficient way to read bytes from a socket channel using Java NIO. The task is quite easy, I have a solution, though I´m searching for a cleaner and more efficient way to solve this. Here´s the scenario:

Data is read from a socket channel
This data is a UTF-8 encoded string
Every line is ended by \r\n, the length is unknown up front
After every line read, I want to do something with the message

My solution reads the data byte per byte and compares every byte to my marker (which is has the value 10 in UTF-8 code pages). Here´s the code:

ByteBuffer res = ByteBuffer.allocate(512);
boolean completed = false;
try {
    while (true) {
        ByteBuffer tmp = ByteBuffer.allocate(1);
        if(soc.read(tmp) == -1) {
             break;
        }

        // set marker back to index 0
        tmp.rewind();
        byte cur = tmp.get();
        res.put(cur);

        // have we read newline?
        if (cur == 10) {
            doSomething(res);
            res.clear();
        }
    }

} catch(Exception ex) {
     handle(ex);
}

Even though this does the job, there might be a better way, that doesn't need those per byte comparisons after every iteration.

Thanks for your help!

score 5 · Accepted Answer · answered Jul 17 '15 at 17:50

5

The way I would do it is to read as much as is available, such as 32 KB, and once you have read this, you copy the data byte-by-byte to another buffer, e.g. a StringBuilder. If there is data left in the buffer from the last time you read, you can continue using the buffer until it it is all consumed, at which point you read more data.

Note: each system call is expensive. It could take 2-5 micro-seconds. This doesn't sound like much unless you call it millions of times and it will add seconds to reading 1 MB.

answered Jul 17 '15 at 17:50

Peter Lawrey

525,659
79
751
1,130

Ok, I´ve changed the code so that it uses larger buffers to reduce the number of system calls. I use two buffers of same size, in every loop I read from my socket into buffer A. After that I loop over that buffer and copy all bytes into a buffer B of same size. If I reach my marker, I process buffer B and reallocate buffer B to ensure that shorter messages don´t hit bytes from longer runs. If buffer size and marker don´t match, the next run simply appends so that I don´t have to care about remainders. – Patze Jul 17 '15 at 20:27
If you `write(byteBuffer)` from one buffer to another it can be 8x faster or more than doing it byte by byte. – Peter Lawrey Jul 18 '15 at 16:48

score 0 · Answer 2 · answered Jul 18 '15 at 07:45

0

Here´s the code of my final solution.

ByteBuffer res = ByteBuffer.allocate(maxByte);
while (true) {
    ByteBuffer tmp = ByteBuffer.allocate(maxByte);

    int bytesRead = clientSocket.read(tmp);
    if (bytesRead == -1) {
        break;
    }

    // rewind ByteBuffer to get it back to start
    tmp.rewind();

    for (int i = 0; i < bytesRead; i++) {
        byte cur = tmp.get(i);
        res.put(cur);
        if (cur == marker) {
            processMessage(res);
            res = ByteBuffer.allocate(maxByte);
        }
    }

    // reached end of message, break loop
    if (bytesRead < tmpSize) {
        break;
    }
}

answered Jul 18 '15 at 07:45

Patze

297
2
13

You don't need to allocate a new `tmp` buffer every time around the loop. You should `flip()` the buffer before `get()`, and `compact()` it afterwards, instead of `rewind()`. You don't need to re-allocate `res` on success: just `clear()` it. – user207421 Jul 18 '15 at 08:30
What´s the better choice, compacting after every get() or after reaching the marker? My problem with clear() is, that it only "clears" the buffer logically. I´ve run into the problem into the problem that my messages have variable length, if the following message is shorter than the previous one, I process "old" data from previous iterations without being able to recognize this. I haven´t found a way to do this without reallocating it. – Patze Jul 18 '15 at 09:13
1

1. `compact()` after every `flip()`. 2. Your problem is that you weren't calling `compact()`. There is zero difference between allocating a new `ByteBuffer` and `clear()` on an existing one of the same capacity except that `clear()` is many times as efficient. – user207421 Jul 18 '15 at 11:02
Ok, I´ve replaced `tmp.rewind()` with `tmp.flip(); tmp.compact()` and `res = ByteBuffer.allocate(maxByte);` with `res.clear()`. I´m back to the old problem...I know I´m clearly missing something, I might need to flip the other buffer too...can you give me an example with my code? – Patze Jul 18 '15 at 11:58
1

It has to be flip, get, compact, in that order. – user207421 Jul 18 '15 at 12:20

Read bytes from Java NIO socketchannel until marker is reached

2 Answers2