2

I am testing a flatbuffers serialization implementation, but I am seeing a much larger ratio of serialized data size to raw data size. I realize that the protocol is designed to allow backward compatibility and there are alignment considerations that cause some amount of bloating. However, once built, the buffer is approximately 2x the size of the raw data that I am putting into it. That seems large to me, and I am suspicious that it is related to how I have structured my schema. Here is the schema that I would ideally use. It allows for flexibility and makes good logical sense with the type of information that I am trying to represent.

// IDL file

namespace Data;

// Structs \\

struct Position {
  x :short;
  y :short;
  z :short;
}

// Tables \\

table Interaction {
  pos    :Position;
  value  :uint;
}

table Event {
  interactions :[Interaction]; // 1-3 interactions are typical in a given event, but could be as high as 30
  id           :ubyte=255;
  time1        :uint;
  time2        :ulong;
}

table Packet {
  events1 :[Event];       // 1000s or more are typical in a given Packet
  events2 :[OtherEvent1]; // Other events that would be defined but occur much less frequently than events1
  events3 :[OtherEvent2]; // Other events that would be defined but occur much less frequently than events1
}

root_type Packet;

Is this 2x wire size expected based on how I have structured this schema? Is it possibly just inevitable because of the small number of fields in a given table and the large number of elements in the vectors? I have tried to reduce alignment issues by artificially making every variable type the same size (uint), and I have tried bypassing the Interaction table and directly making the Event table have a vector of Position structs (which would take away some of the backward compatibility that I am looking for if I need to make changes in the future). The best I have been able to get the ratio down to is 1.7x. Is that a reasonable amount of extra data?

jaws20
  • 23
  • 2

1 Answers1

1

Yes, there is overhead in alignment, indirect offsets, vtables and a few other things. You're best of reading https://google.github.io/flatbuffers/flatbuffers_internals.html to get an understanding of these, which would help in designing the smallest possible representation.

Aardappel
  • 5,559
  • 1
  • 19
  • 22
  • 1
    Thanks for pointing me to that document. I feel like I looked at all of the other documentation except that file. It was very helpful to understand what is going on under the hood. Ultimately, the fact that my data is many groups of a small amount of data coupled with the fact that full 32bit uints are used for offsets and vector size info is causing the bulk of the 'extra' data I was seeing. I have come up with some alternatives to mitigate this. Thanks! – jaws20 Dec 22 '20 at 21:27