I've got two simple code blocks that have vastly different performance:
void testProto() {
demo::Person* person = new demo::Person();
person->set_data(data, BUFFER_LEN);
}
void testMemcpy() {
demo::Person* person = new demo::Person();
memcpy(memcpy_dest, data, BUFFER_LEN);
}
The proto file looks like this:
message Person {
bytes data = 1;
}
According to the Protobuf encoding docs, setting length-delimited data seems as simple as copying data with a few header bytes. Why is it that the first function takes 5-10x more time than the second?
I made a full, easy to run example here.
Additional notes/context:
- Flatbuffers, and alternative to protobufs, does not have this problem
- Here's my attempt at using a debugger. I can't step below the
Set
method. - The reason this performance matters to me is that I’m converting some high throughput/low latency networking code over to protobufs. Since I’m running code like the above multiple times per packet, protobufs significantly hurts performance.
- I’m running at -O3, but even at -O0, there’s still a huge performance difference
- Function call overhead is not the problem because the poor performance scales with the size of the data. Function calls is just a constant overhead.
- I’ve tried a variety of ways to ensure that the memcpy is not optimized away (-O0, using the array). I’m pretty confident that the memcpy is not optimized away.
- I tried malloc inside of
testMemcpy
. That slowed things down a bit, but it's still at least 5x worse. - I tried this on a Macbook M1 and Ubuntu Intel machine