Java: Why does my String get corrupted over UDP?

Question

I'm sending an UDP-Datagram with only one string as content and I'm creating the packet like this:

String content = ALIVE + "," + clusterName + "," + nodeName + "," + location;
byte[] data = content.getBytes();
packet = new DatagramPacket(data, data.length);

The problem is, when it arrives it has some weird binary data at the end, which can't be displayed as characters (in Sublime Text it just shows several a NUL-symbol).

String data = new String(packet.getData());

I extract the different information parts with a string tokenizer(,) and solved this problem now by just adding another , at the end before sending it. Still I would like to know, where does this data come from?

May be it is related to this? http://stackoverflow.com/questions/8229064/how-to-get-rid-of-the-empty-remaining-of-the-buffer — Ravi, Apr 20 '15 at 19:19

score 2 · Accepted Answer · answered Apr 20 '15 at 17:55

Heed carefully the answers advising you to specify character encoding explicitly on both ends. Their advice is excellent.

However, if the character data is received accurately but for the addition of some junk at the end, then your issue is unlikely to arise from a character encoding mismatch. More likely it arises from incorrect use of DatagramPacket by the receiver.

DatagramPacket provides a fixed-length buffer for messages, and the getData() method returns that buffer. If it is longer than the message most recently received in it, then the tail end will contain data unrelated to that message. After receiving a message, you must use the packet's getLength() method to determine how many of the bytes in the buffer correspond to the message.

Thanks, this worked perfectly. (String data = new String(packet.getData(), 0, packet.getLength());) — Lasse Meyer, Apr 21 '15 at 18:15

bmargulies · Answer 2 · 2015-04-20T17:34:22.037

1

Never, ever, call String.getBytes() or the constructor that just takes byte[].

Always pass an explicit character set on both sides.

As your code is currently written, the sender can generate bytes of one encoding, and the reader can (mis)interpret them as some other encoding, producing trash of all flavors.

You might have other problems, as well.

edited Apr 20 '15 at 17:34

answered Apr 20 '15 at 17:31

bmargulies

97,814
39
186
310

Good advice, but it does not answer the question. – John Bollinger Apr 20 '15 at 17:32
Oh? Are you sure the OP isn't suffering from data corruption from mismatched charsets? – bmargulies Apr 20 '15 at 17:33
No, I am not sure (though I find it doubtful from the description). Neither are you. The edit improved the answer, though. – John Bollinger Apr 20 '15 at 17:36
Encoding wasn't the problem (although in the future it could have been one), but I guess specifiying the charset on both sides is always better. Thanks for the tip. – Lasse Meyer Apr 21 '15 at 18:27

Brian Agnew · Answer 3 · 2015-04-20T17:40:11.577

You're converting from characters to bytes at one end, and from bytes to characters at the other. All well and good, but you're not specifying the character encodings in use, and if those are mismatched, the byte/character conversion will not work properly.

You have two options:

specify the conversions with the appropriate character set
enforce the default encoding used by the JVM using the confusingly named -Dfile.encoding JVM parameter.

I would prefer the first option, since you may not have control over how your code is executed, or where (e.g. if your code is lifted into a library for use elsewhere)

score 1 · Answer 4 · answered Apr 20 '15 at 17:43

1

you can specify character set explicitly like this :

byte[] data = content.getBytes(StandardCharsets.UTF_8);

answered Apr 20 '15 at 17:43

Bhavin Panchani

1,332
11
17

Java: Why does my String get corrupted over UDP?

4 Answers4