0

I have a need to write > 2GB to a an instance of flatbufferBuilder class. The following question on StackOverflow briefly covers this topic. I am looking for options to bypass this by considering the advice mentioned in the linked question which is to Create a sequence of flatbuffers

//flatbuffer struct contains a vector that can grow ~3.5 million in size(which consequently exceeds 2GB)

struct ValidationReportT : public flatbuffers::NativeTable
{
  std::vector<flatbuffers::unique_ptr<ValidationDefectT>> defects{};
}

//The code snippet below is my attempt to create a sequence of flatbufferBuilder instances.

void createReport(std::vector<flatbuffers::FlatBufferBuilder>& fbbDefectsVec)
{ 

    ValidationReportT report;          //report has all defects already populated
    ValidationReportT partialReport;   //Partial report captures blocks of max_elems, writes to flatbufferBuilder class, then the partialReport defects vector is cleared

    if (report.defects.size())
    {
        int size       = report.defects.size();
        auto elem_size  = sizeof(*report.defects[0]);
        float32_t max_elems  = FLATBUFFERS_MAX_BUFFER_SIZE / (float32_t)elem_size;
        auto fbbsNeeded = size < max_elems ? 1 : (int)ceil(size / max_elems);

        fbbDefectsVec.resize(fbbsNeeded);
        int offset = 0;
        int idx    = 0;

        size_t start = offset;
        size_t end = offset + max_elems;     //Loop over blocks of max elems

        while (size > max_elems)
        {
            for (size_t i = start; i < end; i++)
            {
                partialReport.defects.push_back(std::move(report.defects[i]));
            }
            offset += max_elems;
            size -= max_elems;

            fbbDefectsVec[idx].Finish(
                validation::ValidationReport::Pack(fbbDefectsVec[idx], &partialReport));
            idx++;
            start = offset;
            end = offset + max_elems;
            partialReport.defects.clear();
        }

        if(size > 0)
        {
            //Copy the remaining defects
            //Set the loop bounds
            if(max_elems >= report.defects.size()) //All defects can be fit into a flatbuffer vector
            {
                start = 0;
                end = report.defects.size();
            }
            else
            {
                start = end;
                end = report.defects.size();
            }
            for(size_t i = start; i < end; i++)
            {
                partialReport.defects.push_back(std::move(report.defects[i]));
            }
            fbbDefectsVec[idx].Finish(
                validation::ValidationReport::Pack(fbbDefectsVec[idx], &partialReport));
        }
    }
}

Even though partialReport holds only the maximum allowed element count based on FLATBUFFERS_MAX_BUFFER_SIZE, I still get the size limit assertion failed:

Assertion `size() < FLATBUFFERS_MAX_BUFFER_SIZE' failed.
Aborted

On this particular line

fbbDefectsVec[idx].Finish(
                    validation::ValidationReport::Pack(fbbDefectsVec[idx], &partialReport));

Why is this so? And how do I bypass this?

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • I don't know anything about flatbuffers; I'm guessing it's some kind of serialization library. I suspect `sizeof(*report.defects[0])` may not directly correspond to the size in bytes of the serialized representation of `*report.defects[0]`, in the same way that `sizeof(std::vector)` is a small compile-time constant even though the vector may contain arbitrarily large amount of data at runtime. What does `ValidationDefectT` look like? – Igor Tandetnik Oct 10 '22 at 14:04
  • Thanks a lot. sizeof(*report.defects[0]) not corresponding to the size in bytes of the serialized representation of *report.defects[0] is precisely the problem. This perhaps could be the answer to this question. – Abhishek Kusnoor Oct 11 '22 at 05:30
  • @IgorTandetnik I am successful in creating the sequence of flatbuffers by hardcoding the maximum number of elements per flatbuffer instance. However, I am trying to optimize it further by verifying that the deserialized content does not in fact overflow the 2GB limit before calling ::Pack. Any clues here? Thanks in advance! – Abhishek Kusnoor Oct 12 '22 at 12:43
  • Well, apparently you can get the current size of the buffer with `FlatBufferBuilder::GetSize`. So you could just pack it until its size is within some safety margin from the max size (so that the next object could overshoot the limit). If you wish to be more precise, you could serialize each object into a temporary buffer just to get its serialized size, and then serialize again into the "real" buffer. That would of course hurt performance. – Igor Tandetnik Oct 12 '22 at 13:40
  • Aren't the two options you suggested the same? GetSize can only be called by a 'Finished' flatbuffer instance, which would mean the data is already serialized. If I were to go with the first option and were to detect if the size is within some safety margin, I have to serialize it first, right? which is the same as the second option 'if you wish to be more precise'? Seems I am a bit confused.. – Abhishek Kusnoor Oct 12 '22 at 18:27
  • I didn't realize that. I thought `GetSize` could be called at any time, to obtain the size of the buffer accumulated so far. Like I said, I'm not familiar with the library, I just had a quick look at the documentation. – Igor Tandetnik Oct 12 '22 at 20:28

0 Answers0