I am experimenting with the usage of flatbuffers in my company as a replacement for raw structs. The classes we need to serialize are fairly large and I have noticed that the overhead of flatbuffer serialization is more than we can take when running debug builds.
I replicated my finding with the following simple test program (the datatype is similar to the datatype in our production code):
#include "stdafx.h"
#include <flatbuffers/flatbuffers.h>
#include "footprints_generated.h"
#include <vector>
#include <iostream>
#include <chrono>
using namespace Serialization::Dummy::FakeFootprints;
flatbuffers::FlatBufferBuilder builder;
flatbuffers::Offset<XYZData> GenerateXYZ()
{
return CreateXYZData(builder,
1.0,
2.0,
3.0,
4.0,
5.0,
6.0,
7.0,
8.0,
9.0,
10.0,
11.0,
12.0,
13.0,
14.0,
15.0,
16.0,
17.0,
18.0,
19.0,
20.0);
}
flatbuffers::Offset<Fake> GenerateFake()
{
std::vector<flatbuffers::Offset<XYZData>> vec;
for(int i = 0; i < 512; i++)
{
vec.push_back(GenerateXYZ());
}
auto XYZVector = builder.CreateVector(vec);
return CreateFake(builder,
1.0,
2.0,
3.0,
4.0,
5.0,
6.0,
7.0,
8.0,
9.0,
10.0,
XYZVector);
}
int main()
{
auto start = std::chrono::steady_clock::now();
for(auto i = 0; i < 1000; i++)
{
auto fake = GenerateFake();
}
auto end = std::chrono::steady_clock::now();
auto diff = end - start;
std::cout << std::chrono::duration <double, std::milli>(diff).count() << " ms" << std::endl;
std::string dummy;
std::cin >> dummy;
}
Which takes around 40 seconds to run on my pc in debug (approx. 400ms in release). I'm looking for any way to improve performance in the debug build. Profiling showed that most time is spent in std::vector code, so I tried setting _ITERATOR_DEBUG_LEVEL to zero, but that did not result in any significant performance increase.