I am using flatbuffers in Go to send an of 10000 array of floats over TCP between two ports on my local machine. I am sending the same message in a loop that only does that. Rate I achieve is only approximately 2ms per message but in C++ I achieve is approximately 140 microseconds per message. I have the following schema for my flatbuffers messages
namespace MyModel;
table Features {
data:[float32];
}
root_type Features;
and then in the Go code I have builder := flatbuffers.NewBuilder(1024)
and conn, err := net.Dial("tcp", endPoint)
then after a few other things I have in the sending loop:
builder.Reset()
MyModel.FeaturesStartDataVector(builder, nFloat32s)
for i := nFloat32s - 1; i >= 0; i-- {
builder.PrependFloat32(data[i])
}
featuresData := builder.EndVector(nFloat32s)
MyModel.FeaturesStart(builder)
MyModel.FeaturesAddData(builder, featuresData)
features := MyModel.FeaturesEnd(builder)
builder.Finish(features)
msg := builder.FinishedBytes()
msgLen := make([]byte, 4)
flatbuffers.WriteUint32(msgLen, uint32(len(msg)))
conn.Write(msgLen)
conn.Write(msg)
The number of messages received and their contents are correctly as received by a Python program. But it 14x slower than when I benchmarked using C++ sender with the data also being received by the same Python program. I am using nFloats = 100000
.
Profiling shows that PrependFloat32
is taking a lot of time.
(pprof) top5 -cum
Showing nodes accounting for 2850ms, 61.29% of 4650ms total
Dropped 5 nodes (cum <= 23.25ms)
Showing top 5 nodes out of 18
flat flat% sum% cum cum%
0 0% 0% 4600ms 98.92% main.main
550ms 11.83% 11.83% 4600ms 98.92% main.run
0 0% 11.83% 4600ms 98.92% runtime.main
1140ms 24.52% 36.34% 3640ms 78.28% github.com/google/flatbuffers/go.(*Builder).PrependFloat32
1160ms 24.95% 61.29% 1790ms 38.49% github.com/google/flatbuffers/go.(*Builder).Prep
Can I make this faster?
(Of course, for such flat data I could just use raw sockets, but later on I will more compexity to the message.)