GoLang Protobuf: How to send multiple messages using the same tcp connection?

Question

I am using GoLang protobuf for encoding (and decoding) messages that are sent through a single tcp connection.

The .proto struct

message Prepare{
   int64 instance = 1;
   int64 round = 2;
   int64 nodeId = 3;
}

Then I use the protoc tool to generate the corresponding stubs.

This is how I write the contents to the wire.

func (t *Prepare) Marshal(wire io.Writer) {

    data, err := proto.Marshal(t)
    if err != nil {
        panic(err)
    }
    _, err = wire.Write(data)
    if err != nil {
        panic(err)
    }
}

And this is how I read and unmarshall in the receiver side.

func (t *Prepare) Unmarshal(wire io.Reader) error {
    data := make([]byte, 8*1024*1024) 
    length, err := wire.Read(data)
    if err != nil {
        panic(err)
    }
    err = proto.Unmarshal(data[:length], t)
    if err != nil {
        panic(err)
    }
    return nil
}

If for each protobuf message, a new tcp connection is spawn, the above approach works fine. But when a single tcp connection is used to transmit multiple messages (persistent connections), then the unmarshalling fails with the error proto: invalid field number

This problem occurs because, protobuf messages when sent using a single connection does not enforce any message boundaries, thus when reading length, err := wire.Read(data) the data buffer can contain bytes corresponding to 1) multiple protobuff messages, and 2) partial protobuff messages.

The protobuf documentation mentions the following as a solution.

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

While this is an intuitive method, it boils down to a chicken-and-egg problem. The length of the byte array written to the wire (as taken from data, err := proto.Marshal(t); len(data) ) is not fixed, and its not known how many bytes will be required for representing this number (len(data)). Now we have the same problem as in, how to send the length of the byte array to read in the receiver side, without actually knowing how many bytes will be taken for that length (stated differently, how can the receiver know how many bytes are corresponding to the length field)

Any suggestions for this?

Thanks

I cannot use gRPC because I need fire and forget kind of transmission (in contrast to request-response pair architecture used in gRPC and streaming gRPC) — Pasindu Tennage, Aug 03 '21 at 12:01

maja · Accepted Answer · 2021-08-03T13:01:54.043

I would recommend using gRPC, but you already stated you don't want that. I can also recommend sending simple UTP packages, since UDP doesn't need a connection at all.

If you want to stick to your current approach, the solution is simple though: After marshalling protobuf to a byte array, you know it's length. It's len(data) and that's the value you want to write first. The actual number of bytes written by wire.Write() will be the same. If not, there was a problem with the connection, and the package was only written partialy. So the receiver can't unmarshal it anways.

When receiving, first read the length, prepare a buffer with the correct size or, even better, make a LimitedReader and unmarshal from there.

The number-of-bytes should be encoded as an integer. You can either use a 32bit or 64bit value, and you also need to decide between little and big endian - what you use is irrelevant, as long as the size and endianess is the same on the sender and receiver side.

Take a look at https://pkg.go.dev/encoding/binary and the functions defined on ByteOrder:

binary.LittleEndian.PutUint64(w, uint64(len(data)))
length := int64(binary.LittleEndian.Uint64(r))

Of course, if there is even a simple bug or you are wrong by only one byte, all the remaining data is effectively useless. By sending messages as dedicated UDP packages, you can avoid this issue.

In the receiver side, the receiver first needs to read the "number of bytes" to allocate for the protobuff message. To read the "number of bytes", the receiver should first know how many bytes are there for the "number of bytes", and this is where I am stucked — Pasindu Tennage, Aug 03 '21 at 12:53
For example, if the "number of bytes" is less than 255, the receiver can first read one byte, and then cast it to an int to get the number. If the "number of bytes" is greater than 255, how can the reciever know how many bytes are corresponding to that ""number of bytes""? — Pasindu Tennage, Aug 03 '21 at 12:56
You always encode the length as a 32 bit value. Or 64bit to be save. See my updated answer. — maja, Aug 03 '21 at 13:02

score 0 · Answer 2 · answered Aug 03 '21 at 14:38

0

Elaborating the above answer for the exact scenario mentioned in the question

func (t *Prepare) Marshal(wire io.Writer) {
    data, err := proto.Marshal(t)
    if err != nil {
        panic(err)
    }
    lengthWritten := len(data)
    var b [16]byte
    bs := b[:16]
    binary.LittleEndian.PutUint64(bs, uint64(lengthWritten))
    _, err = wire.Write(bs)
    if err != nil {
        panic(err)
    }
    _, err = wire.Write(data)
    if err != nil {
        panic(err)
    }
}

func (t *Prepare) Unmarshal(wire io.Reader) error {

    var b [16]byte
    bs := b[:16]

    _, err := io.ReadFull(wire, bs)
    numBytes := uint64(binary.LittleEndian.Uint64(bs))

    data := make([]byte, numBytes)
    length, err := io.ReadFull(wire, data)
    if err != nil {
        panic(err)
    }
    err = proto.Unmarshal(data[:length], t)
    if err != nil {
        panic(err)
    }
    return nil
}

answered Aug 03 '21 at 14:38

Pasindu Tennage

1,480
3
14
31

[PutUint64](https://cs.opensource.google/go/go/+/refs/tags/go1.16.6:src/encoding/binary/binary.go;l=82) will only ever use 8 bytes (so you are writing more data than you need to). You can also just make a slice - see [this example](https://pkg.go.dev/encoding/binary#example-ByteOrder-Put). – Brits Aug 03 '21 at 20:58
I'm not sure I understood correctly. Instead of var b [16]byte; bs := b[:16] should I only allocate var b [8]byte; bs := b[:8] ? If so, my approach b [16]byte; bs := b[:16] writes 8 additional byte for each message? Is that what you meant? – Pasindu Tennage Aug 03 '21 at 21:46
1

Exactly - see [this in the playground](https://play.golang.org/p/lqRHB7JE4GY) (but using `make` is cleaner). Your current code will always write 8 * `0` bytes. – Brits Aug 03 '21 at 21:57

GoLang Protobuf: How to send multiple messages using the same tcp connection?

2 Answers2

Linked