Is there a better way to do object serialization for this class?

Question

I am reading other's code and there is a serialization version of this class:

struct ObjectInfo
{
    int32_t m_typeId;
    string m_objectName;
    vector<int32_t> m_haveKeysId; 
    map<int32_t,double> m_objectFeatures;
    
    ObjectInfo():m_typeId(-1),m_objectName("")
    {
        m_objectFeatures.clear();
        m_haveKeysId.clear();
    }
}

The binary version of it is the following:

struct ObjectInfo_B
{
    int32_t m_typeId;
    int32_t m_objectNamePos;
    
    int32_t m_startIndex;
    int32_t m_endIndex;

    int32_t m_haveKeysIdStartIndex;
    int32_t m_haveKeysIdEndIndex;
    
    ObjectInfo_B()
    {
        m_typeId = -1;
        m_objectNamePos = 0;
        m_startIndex = -1;
        m_endIndex = -1;
        m_haveKeysIdStartIndex = -1;
        m_haveKeysIdEndIndex = -1;
    }

Then there is a vector of ObjectInfo:

vector<ObjectInfo> *objectsVec;
ObjectInfo_B *bObjects;

...

Now the code to convert is like below:

startIndex = 0;
int32_t curBufferSize = 0;
for(size_t i = 0;i<objectsVec->size();i++)
{
    bObjects[i].m_typeId = (*objectsVec)[i].m_typeId;
    bObjects[i].m_objectNamePos = curBufferSize;
    
    strcpy(m_objectNameBuffer+curBufferSize,(*objectsVec)[i].m_objectName.c_str());
    curBufferSize += (*objectsVec)[i].m_objectName.size() + 1;
    
    bObjects[i].m_startIndex = startIndex;
    bObjects[i].m_endIndex = startIndex + (*objectsVec)[i].m_objectFeatures.size();
    startIndex = bObjects[i].m_endIndex;

    bObjects[i].m_haveKeysIdStartIndex = haveKeyStartIndex;
    bObjects[i].m_haveKeysIdEndIndex = haveKeyStartIndex +(*objectsVec)[i].m_haveKeysId.size();

...

fwrite((char*)bObjects,sizeof(ObjectInfo_B),wcount,output);

This seems to be very complicated, and I am not farmilaria with serialization. Is there an easier way to do it in C++? A quick search indicates that this below can do similar things, but can it do the conversion for the above code in a much simpler way?

https://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/index.html

Have you considered converting this to JSON? Or how about [this](https://developers.google.com/protocol-buffers)? — PaulMcKenzie, Dec 09 '21 at 04:16
Binary serialization is notoriously hard and usually not portable.(no guarantee on exact memory layouts). You are better of serializing to some text format (like xml, json, yaml). If you need to store a lot of data look for a library that can do it (or you end up having to write code yourself, with specifications like : an int takes 4 bytes is stored little endian, a string starts with an integer stating its size followed by utf-8 encoded characters... etc.. etc.. — Pepijn Kramer, Dec 09 '21 at 04:40
@PepijnKramer Without fully understanding the author's intent, I don't want to convert it into JSON. This code loads plain text data into ObjectInfo, converts it to ObjectInfo_B and serialize it into binary format file as output. — marlon, Dec 09 '21 at 04:58
*Is there a better way to do object serialization for this class?* -- Closing as opinion-based. If the way you have works, and others have the opinion that the "better way" is to use a library or a well-known format, then... — PaulMcKenzie, Dec 09 '21 at 05:02
I don't know the intent either ;) But the binary version written is fragile, and may only be read back later if all compiler settings etc. (operating system, target cpu etc) are kept exactly the same. — Pepijn Kramer, Dec 09 '21 at 05:02
You also have the alignment of the members not byte aligned, which is not enforced in your sample code using a `#pragma pack(1)` or similar syntax, depending on the compiler. The author didn't care that this wouldn't work across platforms -- so again, what is meant by "better way"? — PaulMcKenzie, Dec 09 '21 at 05:09
@PaulMcKenzie 'better way' I meant an easier way. This code seems complicated enough to be understood. — marlon, Dec 09 '21 at 05:18
Oh as code goes this isn't hard, it's maybe a bit too "C" like for my taste (strcpy, fwrite) but functional. Talk yourself through it line by and you will see it will get relevant information stored in objects into simpler structs (of only std::int32_t) that can be dumped to disk. So the code isn't actually serializing objects.. just the data — Pepijn Kramer, Dec 09 '21 at 05:47

score 0 · Answer 1 · answered Dec 09 '21 at 04:40

The main question is why you want to hold two different versions of the same class. If the main purpose of the binary version is providing a binary I/O, I'd recommend writing a corresponding I/O function. Here's a simple example.

If you are dealing with legacy code, a constructor of ObjectInfo_b from ObjectInfo makes the code more readable:

ObjectInfo_b(const ObjectInfo& obj) {
    // copy all member variables according to your code snippet
    m_typeId = obj.m_typeId;
    // ...
}

Then the serialization part looks like this

vector<ObjectInfo> objectsVec;  // Note: removed pointer
vector<ObjectInfo_b> bObjects; // also using a std::vector

for (const ObjectInfo obj : objectsVec) {
    // create ObjectInfo_b from obj and append to vector via move semantics
    bObjects.emplace_back(ObjectInfo_b(obj)); 
}
// ...
// Pointer to objects
bObjects* bObject_ptr = &bObjects[0];

Again, with the provided information, the class seems duplicate and should be removed. A function that writes ObjectInfo in a binary format is sufficient.

But the definition of the two classes are very different. In the 2nd one, what' the purpose of m_startIndex & m_endIndex by the author? I guess there should be an easier way to achieve the same purpose without damaging the author's intent. This code loads plain text data into ObjectInfo, converts it to ObjectInfo_B and serialize it into binary format file as output. — marlon, Dec 09 '21 at 04:55
I assume that m_startIndex and m_endIndex are used to save the addresses of the binary version of the map. If these are ever required, they can be calculated on-the-fly. My guess is that a better I/O handling makes them unnecessary. — Yoke, Dec 09 '21 at 20:22

score 0 · Answer 2 · answered Dec 09 '21 at 06:03

A sketch of the direction I would take to refactor the code:

#include <cstdint>
#include <iostream>
#include <fstream>
#include <bitcast>

struct B
{
};

struct ObjectInfo_B
{
    std::int32_t m_typeId = -1;
    std::int32_t m_objectNamePos = -1;
    
    std::int32_t m_startIndex = -1;
    std::int32_t m_endIndex = -1;

    std::int32_t m_haveKeysIdStartIndex = -1;
    std::int32_t m_haveKeysIdEndIndex = -1;

};

auto get_data_from_objects(const std::vector<B>& objects)
{
    std::int32_t startIndex{0};
    std::int32_t curBufferSize{0};
    std::vector<ObjectInfo_B> object_infos(objects.size());
    
    for (std::size_t n = 0; n < object.size(); n++ )
    {
        auto& object_info = objects_info[n];
        auto& object = objects[n];
        
        object_info.m_typeId = object.m_typeId;
        object_info.m_objectNamePos = curBufferSize;
        ....
    }
    
}


std::ostream operator<<(std::ostream& os, const std::vector<ObjectInfo_B>& object_infos)
{
    for(const ObjectInfo_B& object_info : object_infos)
    {
        os.fwrite(std::bitcast<char*>(&object_info), sizeof(ObjectInfo_B));
    }
}


void save_to(std::ostream& os, const std::vector<B>& objects)
{
   auto object_infos = get_data_from_objects(objects);
   os << object_infos();
}

Is there a better way to do object serialization for this class?

2 Answers2