2

I am using flatbuffers in Go to send an of 10000 array of floats over TCP between two ports on my local machine. I am sending the same message in a loop that only does that. Rate I achieve is only approximately 2ms per message but in C++ I achieve is approximately 140 microseconds per message. I have the following schema for my flatbuffers messages

namespace MyModel;

table Features {
  data:[float32];
}

root_type Features;

and then in the Go code I have builder := flatbuffers.NewBuilder(1024) and conn, err := net.Dial("tcp", endPoint) then after a few other things I have in the sending loop:

builder.Reset()

MyModel.FeaturesStartDataVector(builder, nFloat32s)
for i := nFloat32s - 1; i >= 0; i-- {
    builder.PrependFloat32(data[i])
}
featuresData := builder.EndVector(nFloat32s)
MyModel.FeaturesStart(builder)
MyModel.FeaturesAddData(builder, featuresData)
features := MyModel.FeaturesEnd(builder)

builder.Finish(features)
msg := builder.FinishedBytes()
msgLen := make([]byte, 4)
flatbuffers.WriteUint32(msgLen, uint32(len(msg)))

conn.Write(msgLen)
conn.Write(msg)

The number of messages received and their contents are correctly as received by a Python program. But it 14x slower than when I benchmarked using C++ sender with the data also being received by the same Python program. I am using nFloats = 100000.

Profiling shows that PrependFloat32 is taking a lot of time.

(pprof) top5 -cum
Showing nodes accounting for 2850ms, 61.29% of 4650ms total
Dropped 5 nodes (cum <= 23.25ms)
Showing top 5 nodes out of 18
      flat  flat%   sum%        cum   cum%
         0     0%     0%     4600ms 98.92%  main.main
     550ms 11.83% 11.83%     4600ms 98.92%  main.run
         0     0% 11.83%     4600ms 98.92%  runtime.main
    1140ms 24.52% 36.34%     3640ms 78.28%  github.com/google/flatbuffers/go.(*Builder).PrependFloat32
    1160ms 24.95% 61.29%     1790ms 38.49%  github.com/google/flatbuffers/go.(*Builder).Prep

Can I make this faster?

(Of course, for such flat data I could just use raw sockets, but later on I will more compexity to the message.)

snow_abstraction
  • 408
  • 6
  • 13
  • 4
    You always call `PrependFloat32()` which also has to check and ensure buffer size. You could ensure size yourself by calling `Prep(nFloat32s * flatbuffers.SizeFloat32)`, and then add each individual `float32` with `PlaceFloat32()`. – icza Oct 31 '19 at 11:24
  • @icza Thanks. That cut the per message time from 2ms to around 420 micros. – snow_abstraction Oct 31 '19 at 19:31
  • @icza you should write this into an answer to then be selected – Michael Ramos Apr 08 '20 at 15:39
  • 1
    @rambossa I ended up doing this https://gitlab.com/snow_abstraction/benchmark_feature_transfer/-/blob/master/flat/flatbuffers/go_fast_sender/src/sender/sender.go#L91 I originally had something closer to icza's suggestion but after reading flatbuffers' source, I came up with that I linked to. – snow_abstraction Apr 08 '20 at 19:26
  • @snow_abstraction thank you, super helpful currently. Could even write and accept your own answer. – Michael Ramos Apr 08 '20 at 20:40

2 Answers2

2

For anyone who is curious about the solution in the linked github code from snow_abstraction's comment, the question uses:

    for i := nFloat32s - 1; i >= 0; i-- {
        builder.PrependFloat32(data[i])
    }

versus the linked code:

    for i := nFloat32s - 1; i >= 0; i-- {
        builder.PlaceFloat32(data[i])
    }

PlaceFloat32 is faster because: "MyModel.FeaturesStartDataVector allocates enough space so skip the extra checks that an idiomatic call to build.PrependFloat32(data[i]) would entail.".

The flatbuffers source code confirms that PrependFloat32 calls Prep to do some alignment and sizing checks, which appear to be redundant due to the prior call to MyModel.FeaturesStartDataVector which calls StartVector which calls Prep. So, since Prep has already been called to check the boundaries of the whole array, there is no need to call it to boundary check every individual float32 written to the array.

bain
  • 1,710
  • 14
  • 15
1

What @icza says is worth trying.. beyond that, maybe Go has some kind of array copy function that can be used to add all floats at once, though for that you'd need to add some kind of CreateFloatVector function to the builder. There is already CreateByteVector: https://github.com/google/flatbuffers/blob/521e255ad9656a213971b30ba1beeec395b2e27e/go/builder.go#L343

Aardappel
  • 5,559
  • 1
  • 19
  • 22
  • Thanks. That is definitely do-able as there is such `copy` and just @icza's suggestion cut the per message from 2ms to around 420 micros. That said, I was hoping for a solution without extending flatbuffers. Maybe I'll have submit a PR a create CreateFloat32Vector. – snow_abstraction Oct 31 '19 at 19:30