4

Context

I am writing an event-driven application server in C++. I want to use google protocol buffers for moving my data. Since the server is event driven, the connection handler API is basically a callback function letting me know when another buffer of N bytes has arrived from the client.

Question

My question as a complete beginner of protobuf is this: is it possible to somehow coax protobuf into accepting the many buffers required to make up one complete message to facilitate a "stream parser" rather than waiting for the whole data to arrive into a temporary buffer first?

In other words i want this:

//Event API. May be called multiple times for each protobuf message
bool data_arrived_from_client(void *buf,size_t len){
    my_protobuf.parse_part(buf,len); // THIS IS THE GROSSLY SIMPLIFIED SEMANTIC OF WHAT I WANT
    message_size+=len;
    if(message_size>COMPLETE_BUFFER_SIZE){
        use_complete_protobuf();
        return true;
    }
    return false;
}

..instead of this:

//Event API. May be called multiple times for each protobuf message
bool data_arrived_from_client(void *buf,size_t len){
    my_temp_buffer.append_data(buf,len);
    message_size+=len;
    if(message_size>COMPLETE_BUFFER_SIZE){
        my_protobuf.ParseFromArray(my_temp_buffer.data_ptr(),my_temp_buffer.size());
        use_complete_protobuf();
        return true;
    }
    return false;
}

Answer

Answers with complete code is especially appreciated!

Mr. Developerdude
  • 9,118
  • 10
  • 57
  • 95
  • Is `data_arrived_from_client()` supposed to be supplied with complete message contents? Otherwise you'll need a _transport layer_ that completes message contents from the received stream for you (e.g. by prefixed message content length or so). – πάντα ῥεῖ Feb 23 '14 at 21:31
  • That is sort of the point of this question. It does NOT contain the data for complete messages, but a small part of it. – Mr. Developerdude Feb 23 '14 at 21:35

2 Answers2

3

No, this is not possible.

The Protobuf parser is a recursive descent parser, meaning quite a bit of its state is stored on the stack. This makes it fast, but it means there's no way to pause the parser in the middle except to pause the whole thread. If your app is non-blocking, you'll just have to buffer bytes until you have a whole message to parse.

That said, this isn't as bad as it sounds. Remember that the final parsed representation of the message (i.e. the in-memory message object) is much larger than the wire representation. So you are hardly wasting memory on buffering compared to what you're going to do with it later. In fact, holding off on parsing until you actually have all the data may actually save memory, since you aren't holding on to a large half-parsed object that's just sitting there waiting for data to arrive.

Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • Hi Kenton, is this still true of the Protobuf parser version 3? – user1158559 Sep 23 '16 at 16:19
  • 1
    @user1158559 My understanding is that proto3 does not change anything about the low-level parsing. Mainly it adds (and removes) various high-level language features. That said, proto3 was developed after I left Google and I haven't kept close track of it. – Kenton Varda Sep 23 '16 at 21:30
  • Can `repeated fields` read independently? – daohu527 Apr 11 '22 at 06:17
2

Yes, this is possible, I've done it in Javascript, but the design could be ported to C++.

https://github.com/chrisdew/protostream

fadedbee
  • 42,671
  • 44
  • 178
  • 308
  • 1
    This is exactly what I want, why can't repeated feild read iteratively? – daohu527 Apr 11 '22 at 06:16
  • Assuming you mean "why can't protobuf decoders read repeated fields iteratively", that is a question for the authors of those decoders. If you would like help with protostream please raise an issue at https://github.com/chrisdew/protostream/issues and I'll see if I can help. – fadedbee Apr 11 '22 at 08:40
  • 1
    I use python but not js, so is not suitable for this repository, I read the source code of proto, its internal is iteratively read, but it is not open to developers – daohu527 Apr 11 '22 at 09:04
  • 1
    In that case you could port this code to Python. Protostream is simply a minimal protobuf decoder that only knows how to find a tag and the length of the following data. Are you in charge of the upstream protobuf messages? (It's logically impossible to decode concatenated non-repeating elements from a stream, as non-repeating protobuf elements do not have a well-defined end, hence the need for a "wrapper".) – fadedbee Apr 11 '22 at 11:21