Receiving a utf-8 string using InputStreamReader from a Socket?

Question

I'm trying to receive a string from a device using this code:

        byte[] buf = new byte[4];
        int read = inFromDevice.read(buf);
        Logger.getLogger(Utill.class.getName() + " DEBUG_ERR01").log(Level.INFO, "Bytes read: {0}", read);
        int msgLength = ByteBuffer.wrap(buf).getInt();
        Logger.getLogger(Utill.class.getName() + " DEBUG_ERR01").log(Level.INFO, "Message length: {0}", msgLength);
        Reader r = new InputStreamReader(inFromDevice);
        char[] cb = new char[msgLength];
        int actualCharsRead = r.read(cb);
        Logger.getLogger(Utill.class.getName() + " DEBUG_ERR01").log(Level.INFO, "Actual chars read: {0} char array length: {1}", new Object[]{actualCharsRead, cb.length});
        String msgText = String.valueOf(cb, 0, cb.length);
        Logger.getLogger(Utill.class.getName() + "Messages Loggining recieve: ").log(Level.INFO, msgText);
        return msgText;

the inFromDevice is and InputStream acquired from an accepted ServerSocket.

The code is working and returning messages most of the time, but some times I get messages smaller than msgLength (which is wrong according to the protocol)

An example from the log is Actual chars read: 1020 char array length: 1391

I think the problem is external due to a network problem or device is having an issue, but I need some expert insight on this. are there any known problems in Java that may cause this?

If you specifically want UTF-8, why didn't you tell [`InputStreamReader`](https://docs.oracle.com/javase/8/docs/api/java/io/InputStreamReader.html#constructor.summary) that? — Andreas, Apr 09 '19 at 19:52
I bet `msgLength` is in *bytes*, so why are you expecting *char* count to be same as *byte* count, if message contains non-ASCII characters and encoding is UTF-8. You do know how UTF-8 works, right? — Andreas, Apr 09 '19 at 19:54
@Andreas no the protocol specifies that the first 4 bytes are the number of UTF-8 characters being sent. I didn't tell the InputStreamReader about UTF-8 because it's the default. — alibttb, Apr 09 '19 at 21:53

score 2 · Accepted Answer · answered Apr 09 '19 at 20:01

2

An InputStreamReader will only block until it can read one character into the buffer, or detect EOF. There is no guarantee that the buffer will be filled.

If your protocol indicates the length of the string being sent, the receiver needs to loop, tracking the number of characters remaining, until all have been read.

answered Apr 09 '19 at 20:01

erickson

265,237
58
395
493

This seems logical, I'll do try it, do you suggest a clean way to loop or should I use some other kind of reader? @erickson – alibttb Apr 09 '19 at 21:56
@alibttb You could do something like `CharBuffer expected = CharBuffer.wrap(cb); while (cb.hasRemaining()) r.read(expected);` You should do something similar when reading the data for the `ByteBuffer` that holds the message length. – erickson Apr 10 '19 at 02:48
you mean `CharBuffer expected = CharBuffer.wrap(cb); while (expected.hasRemaining()) { r.read(expected); }` @erickson – alibttb Apr 11 '19 at 18:13
@alibttb Yes, sorry for the typo. You have it correct. – erickson Apr 11 '19 at 18:15

Receiving a utf-8 string using InputStreamReader from a Socket?

1 Answers1