1

I'm having a strange issue on serialization of repeated double fields in C++ protobuf. For practice I've choosen time series data and tried to serialize/deserialize in my app. I reproduced error in one .cpp file (see full gist), the core idea reading writing protobuf files here, got it from examples:

void writeMessage(::google::protobuf::Message &message) {
    google::protobuf::uint32 size = message.ByteSize();
    char buffer[size]; 
    if(!message.SerializeToArray(buffer, size)) {
        cerr << "Failed to serialize message: \n" << message.DebugString();
        terminate();
    }
    codedOut->WriteVarint32(size);
    codedOut->WriteRaw(buffer, size);
}
bool readMessage(::google::protobuf::Message &message) {
    google::protobuf::uint32 size;
    if (!codedIn->ReadVarint32(&size)) { 
        return false;
    }
    char buffer[size];

    if(!codedIn->ReadRaw(buffer, size)) {
        cerr << "Can't do ReadRaw of message size " << size << "\n";
        terminate();
    }
    message.ParseFromArray(buffer, size);
    return true;
}

For 1-20 messages it works fine, but if I try to read 50 or more, then last message will be corrupted -- ReadRaw will return false. If I try to ignore ReadRaw return, then message will contain repeated field array with missed values and nulls. Serialization stage supposed to be ok, I've checked everything.

Could you please say, am I doing something wrong?

Full gist you can get from here: https://gist.github.com/alexeyche/d6af8a43d346edc12868

to reproduce an error you just need to do:

protoc -I. --cpp_out=. ./time_series.proto
g++ main.cpp time_series.pb.cc -std=c++11 -L/usr/local/lib -lprotobuf -I/usr/local/include
./a.out synthetic_control_TRAIN out.pb

synthetic_control_TRAIN file with time series, you can get from here https://yadi.sk/d/gxZy8JSvcjiVD

my system: g++ 4.8.1, ubuntu 12.04, libprotobuf 2.6.1

alexeyche
  • 11
  • 1
  • 2
  • You can use `protoc --decode` to decode your data in the shell. That should at least give an idea whether the problem is on the encoding or on the decoding side. – jpa Nov 17 '14 at 17:20

1 Answers1

1

How big is your data? For security, CodedInputStream defaults to a limit of 64MiB, after which it will refuse to read more data. You can increase the limit using CodedInputStream::SetTotalBytesLimit(), but a better solution is to simply read each message using a fresh CodedInputStream. This class is fast to construct and destroy, so just allocate it on the stack, read one message, and then let it go out of scope. (Don't re-allocate the underlying ZeroCopyInputStream, though.)

BTW, it looks like you're trying to emulate the parseDelimitedFrom() format that exists in Protobuf-Java but not Protobuf-C++, but your code as-written is not very efficient: you're making an unnecessary copy of each message on the stack. Consider using my code from this StackOverflow answer.

Community
  • 1
  • 1
Kenton Varda
  • 41,353
  • 8
  • 121
  • 105