5

I am using protobuf now for some weeks, but I still keep getting exceptions when parsing protobuf messages in Java.

I use C++ to create my protobuf messages and send them with boost sockets to a server socket where the Java client ist listening. The C++ code for transmitting the message is this:

boost::asio::streambuf b;
std::ostream os(&b);

ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
CodedOutputStream *coded_output = new CodedOutputStream(raw_output);

coded_output->WriteVarint32(agentMessage.ByteSize());
agentMessage.SerializeToCodedStream(coded_output);

delete coded_output;
delete raw_output;

boost::system::error_code ignored_error;

boost::asio::async_write(socket, b.data(), boost::bind(
        &MessageService::handle_write, this,
        boost::asio::placeholders::error));

As you can see I write with WriteVarint32 the length of the message, thus the Java side should know by using parseDelimitedFrom how far it should read:

AgentMessage agentMessage = AgentMessageProtos.AgentMessage    
                                .parseDelimitedFrom(socket.getInputStream());

But it's no help, I keep getting these kind of Exceptions:

Protocol message contained an invalid tag (zero).
Message missing required fields: ...
Protocol message tag had invalid wire type.
Protocol message end-group tag did not match expected tag.
While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either than the input has been truncated or that an embedded message misreported its own length.

It is important to know, that these exceptions are not thrown on every message. This is only a fraction of the messages I receive the most work out just fine - still I would like to fix this since I do not want to omit the messages.

I would be really gratful if someone could help me out or spent his ideas.


Another interesting fact is the number of messages I receive. A total messages of 1.000 in 2 seconds is normally for my program. In 20 seconds about 100.000 and so on. I reduced the messages sent artificially and when only 6-8 messages are transmitted, there are no errors at all. So might this be a buffering problem on the Java client socket side?

On, let's say 60.000 messages, 5 of them are corrupted on average.

Konrad Reiche
  • 27,743
  • 15
  • 106
  • 143
  • Maybe a silly question but is there any way you've left padding / oversized-buffers in the data, rather than trimming any surplus? – Marc Gravell Jul 02 '11 at 15:45
  • (this error could definitely be easily caused by spare zeros) – Marc Gravell Jul 02 '11 at 15:46
  • @Marc-Gravell: What would be an oversized buffers? Actually I don't get what you think might cause this. Maybe you could point out where I should look for this? Btw. I've added some couple of other exceptions I receive as well. – Konrad Reiche Jul 02 '11 at 15:53
  • A classic would be using a buffer to copy data between streams, but copying more than the correct amount in the final copy (i.e. copying 1050 bytes via a 512 buffer is 2 full buffers and 26 bytes in the third; if you copy all 512 bytes in the third you are copying garbage). Another similar example would be overwriting an existing *file* with *less* data hence leaving some garbage. But anything that involves a staging buffer that might be larger than the actual data (for that operation) could conceivably end up writing invalid data which could contain zeros. – Marc Gravell Jul 02 '11 at 16:00
  • With the extra messages you have added, ***all*** of those are classic for stream corruption, as described above. – Marc Gravell Jul 02 '11 at 16:01
  • Okay so far, but I don't see where this could possibly happen in my code. The only buffered data is shown in the code above. I don't use buffers in any part. But maybe the C++ code is the one which is flawed? – Konrad Reiche Jul 02 '11 at 16:02
  • Very hard to say; first thing I'd do is verify that the bytes I think I am sending are ***exactly*** the bytes I am receiving. Then check that those bytes are valid at both ends. – Marc Gravell Jul 02 '11 at 16:05
  • 1
    Something else I've seen people do ***way too often*** - is use an Encoding to try to get the BLOB as a string, then decode. Guaranteed corruption there (should use base-64 or hex etc if you want a string, not a UTF/codepage/etc) – Marc Gravell Jul 02 '11 at 16:06
  • @Marc-Gravell I added another observation to my original question, maybe this indicates something that this occurs so seldom? – Konrad Reiche Jul 05 '11 at 06:44
  • 1
    "Java client socket side" - well, I would more readily expect simply a bug in the stream processing code, especially if async/threaded. Could be either read or write, in that case... reducing the rate will obviously remove a lot of accidental collisions caused by incorrect async. – Marc Gravell Jul 05 '11 at 07:09

2 Answers2

2

[I'm not really a TCP expert, this may be way off]

Problem is, [Java] TCP Socket's read(byte[] buffer) will return after reading to the end of the TCP frame. If that happens to be mid-message (I mean, protobuf message), parser will choke and throw an InvalidProtocolBufferException.

Any protobuf parsing call uses CodedInputStream internally (src here), which, in case the source is an InputStream, relies on read() -- and, consequently, is subject to the TCP socket issue.

So, when you stuff big amounts of data through your socket, some messages are bound to be split in two frames -- and that's where they get corrupted.

I'm guessing, when you lower message transfer rate (as you said to 6-8 messages per second), each frame gets sent before the next data piece is put into the stream, so each message always gets its very own TCP frame, i.e. none get split and don't get errors. (Or maybe it's just that the errors are rare and low rate just means you need more time to see them)

As for the solution, your best bet would be to handle the buffer yourself, i.e. read a byte[] from the socket (probably using readFully() instead of read() because the former will block until either there's enough data to fill the buffer [or a EOF is encountered], so it's kind of resistant to the mid-message frame end thing), ensure it's got enough data to be parsed into a whole message, and then feed the buffer to the parser.

Also, there's some good read on the subject in this Google Groups topic -- that's where I got the readFully() part.

Ivan Bartsov
  • 19,664
  • 7
  • 61
  • 59
1

I am not familiar with the Java API, but I wonder how Java deals with an uint32 value denoting the message length, because Java only has signed 32-bit integers. A quick look at the Java API reference told me an unsigned 32-bit value is stored within a signed 32-bit variable. So how is the case handled where an unsigned 32-bit value denotes the message length? Also, there seems to be support for varint signed integers in the Java implementation. They are called ZigZag32/64. AFAIK, the C++ version doesn't know about such encodings. So maybe the cause for your problem might be related with these things?

Jonny Dee
  • 837
  • 4
  • 12
  • Maybe, but I guess this would that occur every time and since these exceptions are only triggered sometimes I am not sure. – Konrad Reiche Jul 02 '11 at 23:22