As already pointed out by other answers, you have to consider the position of the buffer, which gets updated by the read
method. So the correct code looks like:
while ((byteRead = readableByteChannel.read(buffer)) > 0 && readCount < 68) {
sb.append(new String(buffer.array(),
buffer.arrayOffset(), buffer.arrayOffset()+buffer.position(), "UTF-8"));
buffer.clear();
readCount++;
}
Note that in your special case, arrayOffset()
will always be zero, but you better write the code in a way, that it doesn’t break when you change something at the buffer allocation code.
But this code is broken. When you read a multiple-byte UTF-8 sequence, it may happen, that the first bytes of that sequence are read in one operation and the remaining bytes are read in the next one. Your attempts to create String
instances from these incomplete sequences will produce invalid characters. Besides that, you are creating these String
instances, just to copy their contents to a StringBuilder
, which is quite inefficient.
So, to do it correctly, you should do something like:
int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder();
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
while(readableByteChannel.read(buffer) > 0 && readCount < 68) {
buffer.flip();
while(dec.decode(buffer, cBuffer, false).isOverflow()) {
cBuffer.flip();
sb.append(cBuffer);
cBuffer.clear();
}
buffer.compact();
readCount++;
}
buffer.flip();
for(boolean more=true; more; ) {
more=dec.decode(buffer, cBuffer, true).isOverflow();
cBuffer.flip();
sb.append(cBuffer);
cBuffer.clear();
}
Note, how both, ReadableByteChannel
and CharsetDecoder
process the buffers using their positions and limits. All you have to do, is to use flip
and compact
correctly as shown in the documentation of compact
.
The only exception is the appending to the Stringbuilder
, as that’s not an NIO function. There, we have to use clear()
, as we know that the Stringbuilder.append
operation does consume all characters from the buffer.
Note that this code still does not deal with certain (unavoidable) error conditions, since you stop after an arbitrary number of read
s, it’s always possible that you cut in the middle of a multi-byte UTF-8 sequence.
But this quite complicated logic has been implemented by the JRE already and if you give up the idea of cutting after a certain number of bytes, you can utilize that:
int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
Reader reader=Channels.newReader(readableByteChannel, "UTF-8");
while(reader.read(cBuffer) > 0 && readCount < 68) {
cBuffer.flip();
sb.append(cBuffer);
cBuffer.clear();
readCount++;
}
Now this code will limit the reading to 256 × 68
characters rather than bytes, but for UTF-8
encoded data, this makes a difference only when there are multi-byte sequences, about which you apparently didn’t care before.
Finally, since you apparently have an InputStream
in the first place, you don’t need the ReadableByteChannel
detour at all:
int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer = CharBuffer.allocate(BUFFER_SIZE);
Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8);
while(reader.read(cBuffer) > 0 && readCount < 68) {
cBuffer.flip();
sb.append(cBuffer);
cBuffer.clear();
readCount++;
}
This might look like “not being NIO code”, but Reader
s are still the canonical way of reading character data, even with NIO; there’s no replacement. The method Reader.read(CharBuffer)
was missing in the first release of NIO, but handed in with Java 5.