My flatbuffers schema file dict.fbs
looks like this:
namespace fbs;
table Dict {
entries:[DictEntry];
}
table DictEntry {
key:string (key);
value:string;
}
root_type Dict;
Now according to the documentation you can emulate a dictionary in Flatbuffers with a sorted vector and binary lookup like this
flatbuffers::FlatBufferBuilder builder(1024);
std::string key, value;
std::ifstream infile(argv[1]);
std::string outfile(argv[2]);
std::vector<flatbuffers::Offset<DictEntry>> entries;
while (std::getline(infile, key) && std::getline(infile, value)) {
entries.push_back(CreateDictEntryDirect(builder, key.c_str(), value.c_str()));
}
auto vec = builder.CreateVectorOfSortedTables(&entries);
auto dict = CreateDict(builder, vec);
builder.Finish(dict);
My original word list has 32MB
on disk. Now for each word in this list I have a normalized key
and a corresponding value
. It would be logical if the serialized flatbuffer dict now had twice the size on disk, say 64MB
, but in reality the output is 111MB
.
Can I optimize this schema to be more compact? What blows up the output to almost 4 times the size?