0

Protobuf definations are like that:

syntax = "proto3"

message HugeMessage {
    // omitted
}

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

In a situation I received a HugeMessage from somebody, and I want to pack it with additional fields, and then transmit the message to someone else. So that I have to Unmarshal the HugeMessage binary into Go structure, pack it into Request, and Marshal again. Due to the hgue size for HugeMessage, the cost for Unmarshal and Marshal are unaffordable. so could I reuse the HugeMessage binary without change the protobuf definations?

func main() {
    // receive it from file or network, not important.
    bins, _ := os.ReadFile("hugeMessage.dump")
    var message HugeMessage
    _ = proto.Unmarshal(bins, &message) // slow
    request := Request{
        name: "xxxx",
        payload: message,
    }
    requestBinary, _ := proto.Marshal(&request) // slow
    // send it.
    os.WriteFile("request.dump", requestBinary, 0644)
}

2 Answers2

1

The short answer is: no, there is no simple or standard way to achieve this.

The most obvious strategy is to do as you currently have - unmarshal the HugeMessage, set it into Request, then marshal again. The golang protobuf API surface doesn't really provide a means to do much beyond that - with good reason.

That said, there are ways to achieve what you're looking to do. But these aren't necessarily safe or reliable, so you have to weigh that cost vs the cost of what you have now.

One way you can avoid the unmarshal is to take advantage of the way a message is normally serialized;

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

.. is equivalent to

message Request {
    string name = 1;
    bytes payload = 2;
}

.. where payload contains the result of calling Marshal(...) against some HugeMessage.

So, if we have the following definitions:

syntax = "proto3";

message HugeMessage {
  bytes field1 = 1;
  string field2 = 2;
  int64 field3 = 3;
}

message Request {
  string name = 1;
  HugeMessage payload = 2;
}

message RawRequest {
  string name = 1;
  bytes payload = 2;
}

The following code:

req1, err := proto.Marshal(&pb.Request{
    Name: "name",
    Payload: &pb.HugeMessage{
        Field1: []byte{1, 2, 3},
        Field2: "test",
        Field3: 948414,
    },
})
if err != nil {
    panic(err)
}

huge, err := proto.Marshal(&pb.HugeMessage{
    Field1: []byte{1, 2, 3},
    Field2: "test",
    Field3: 948414,
})
if err != nil {
    panic(err)
}

req2, err := proto.Marshal(&pb.RawRequest{
    Name:    "name",
    Payload: huge,
})
if err != nil {
    panic(err)
}

fmt.Printf("equal? %t\n", bytes.Equal(req1, req2))

outputs equal? true

Whether this "quirk" is entirely reliable isn't clear, and there is no guarantees it will continue to work indefinitely. And obviously the RawRequest type has to fully mirror the Request type, which isn't ideal.

Another alternative is to construct the message in a more manual fashion, i.e. using the protowire package - again, haphazard, caution advised.

nj_
  • 2,219
  • 1
  • 10
  • 12
  • 1
    You're right I think. Same conclusion from this post: [Fast Encoding Using Pre-serialization](https://blog.najaryan.net/posts/partial-protobuf-encoding/). – Miss Yoimiya's puppy Nov 22 '22 at 09:33
1

Shortly, it could be done via protowire, and not really hard if structure reused isn't complex.

I asked this question not long ago, and I finally work it out inspired by @nj_ 's post. According to the encoding chapter of protobuf, a protocol buffer message is a series of field-value pairs, and the order of those pairs doesn't matter. An obvious idea comes to me: just works like the protoc compiler, make up the embedded field handly and append it to the end of the request.

In this situation, we want to reuse the HugeMessage in Request, so the key-value pair of the field would be 2:{${HugeMessageBinary}}. So the code(a little different) could be:

func binaryEmbeddingImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // 1. create a request with all ready except the payload. and marshal it.
    request := protodef.Request{
        Name: name,
    }
    requestBytes, err = proto.Marshal(&request)
    if err != nil {
        return nil, err
    }
    // 2. manually append the payload to the request, by protowire.
    requestBytes = protowire.AppendTag(requestBytes, 2, protowire.BytesType) //  embedded message is same as a bytes field, in wire view.
    requestBytes = protowire.AppendBytes(requestBytes, messageBytes)
    return requestBytes, nil
}

Tell the field number, field type and the bytes, That's all. Commom way is like that.

func commonImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // receive it from file or network, not important.
    var message protodef.HugeMessage
    _ = proto.Unmarshal(messageBytes, &message) // slow
    request := protodef.Request{
        Name:    name,
        Payload: &message,
    }
    return proto.Marshal(&request) // slow
}

Some benchmark.

$ go test -bench=a -benchtime 10s ./pkg/                               
goos: darwin
goarch: arm64
pkg: pbembedding/pkg
BenchmarkCommon-8             49         288026442 ns/op
BenchmarkEmbedding-8         201         176032133 ns/op
PASS
ok      pbembedding/pkg 80.196s

package pkg

import (
    "github.com/stretchr/testify/assert"
    "golang.org/x/exp/rand"
    "google.golang.org/protobuf/proto"
    "pbembedding/pkg/protodef"
    "testing"
)

var hugeMessageSample = receiveHugeMessageFromSomewhere()

func TestEquivalent(t *testing.T) {
    requestBytes1, _ := commonImplementation(hugeMessageSample, "xxxx")
    requestBytes2, _ := binaryEmbeddingImplementation(hugeMessageSample, "xxxx")
    // They are not always equal int bytes. you should compare them in message view instead of binary from
    // due to: https://developers.google.com/protocol-buffers/docs/encoding#implications
    // I'm Lazy.
    assert.NotEmpty(t, requestBytes1)
    assert.Equal(t, requestBytes1, requestBytes2)
    var request protodef.Request
    err := proto.Unmarshal(requestBytes1, &request)
    assert.NoError(t, err)
    assert.Equal(t, "xxxx", request.Name)
}

// actually mock one.
func receiveHugeMessageFromSomewhere() []byte {
    buffer := make([]byte, 1024*1024*1024)
    _, _ = rand.Read(buffer)
    message := protodef.HugeMessage{
        Data: buffer,
    }
    res, _ := proto.Marshal(&message)
    return res
}

func BenchmarkCommon(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := commonImplementation(hugeMessageSample, "xxxx")
        if err != nil {
            panic(err)
        }
    }
}

func BenchmarkEmbedding(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := binaryEmbeddingImplementation(hugeMessageSample, "xxxx")
        if err != nil {
            panic(err)
        }
    }
}