1

Recently I try to use FlatBuffers in C++. I found FlatBuffers seems to use a lot of type punning with things like reinterpret_cast in C++. This make me a little uncomfortable because I've learned it's undefined behavior in many cases.

e.g. Rect in fbs file:

struct Rect {
    left:int;
    top:int;
    right:int;
    bottom:int;
}

turns into this C++ code for reading it from a table:

  const xxxxx::Rect *position() const {
    return GetStruct<const xxxxx::Rect *>(VT_POSITION);
  }

and the definition of GetStruct simply uses reinterpret_cast.

My questions are:

  1. Is this really undefined behavior in C++?
  2. In practice, will this kind of usage actually be problematic?

Update:

The buffer can just came from network or disk. I don't know if it's different if the buffer actually came from same memory written by writer of the same C++ program.

But the writer's auto-generated method is:

  void add_position(const xxxxx::Rect *position) {
    fbb_.AddStruct(Char::VT_POSITION, position);
  }

which will use this method and this method and so use reinterpret_cast also.

Willy
  • 581
  • 2
  • 10
  • That depends a bit on the definition of `VT_POSITION` and how the data that is there got there. – 1201ProgramAlarm May 13 '20 at 04:25
  • @1201ProgramAlarm `VT_POSITION` is an auto-generated enum constant simply act like an offset I think: enum FlatBuffersVTableOffset FLATBUFFERS_VTABLE_UNDERLYING_TYPE { VT_POSITION = 4, VT_CANDIDATE = 6 }; – Willy May 13 '20 at 04:31
  • 2
    This can't be answered based on the code posted. The code would be valid if an object of the type exists at the location . – M.M May 13 '20 at 05:04
  • @M.M I just want to ask for usual FlatBuffers usage, like the buffer is read from network or disk. – Willy May 13 '20 at 05:10

2 Answers2

3

I didn't analyze the whole FlatBuffers' source code, but I didn't see where these objects are created: I see no new expression, which would create P objects here:

template<typename P> P GetStruct(voffset_t field) const {
    auto field_offset = GetOptionalFieldOffset(field);
    auto p = const_cast<uint8_t *>(data_ + field_offset);
    return field_offset ? reinterpret_cast<P>(p) : nullptr;
  }

So, it seems that this code does have undefined behavior.

However, this is only true for C++17 (or pre). In C++20, there will be implicit-lifetime objects (for example, scalar types, aggregates are implicit-lifetime types). If P has implicit lifetime, then this code can be well defined. Provided that the same memory area are always accessed by a type, which doesn't violate type-punning rules (for example, it always accessed by the same type).

geza
  • 28,403
  • 6
  • 61
  • 135
0

I think both your questions are answered by the Flatbuffers: Use in C++ page:

Direct memory access

As you can see from the above examples, all elements in a buffer are accessed through generated accessors. This is because everything is stored in little endian format on all platforms (the accessor performs a swap operation on big endian machines), and also because the layout of things is generally not known to the user.

For structs, layout is deterministic and guaranteed to be the same across platforms (scalars are aligned to their own size, and structs themselves to their largest member), and you are allowed to access this memory directly by using sizeof() and memcpy on the pointer to a struct, or even an array of structs.

These paragraphs guarantee that – given a valid flatbuffer – all memory accesses are valid, as the memory at that specific location will match the expected layout.

If you are processing untrusted flatbuffers, you first need to use the verifier functions to ensure the flatbuffer is valid:

This verifier will check all offsets, all sizes of fields, and null termination of strings to ensure that when a buffer is accessed, all reads will end up inside the buffer.

Botje
  • 26,269
  • 3
  • 31
  • 41