7

I'm trying to serialize and deserialize raw C pointers and their data, with the example below. It seems to serialize just fine, but I am unsure how to make it deserialize - it just crashes with a memory access violation exception when I deserialize it. I suppose it is because it dosn't know how to deserialize it, but where do I specify that?

Using a vector is not an option, in very large primitive data amounts it is painfully slow

#include <stdint.h>
#include <string>
#include <iostream>
#include <fstream>
#pragma warning (push) 
#pragma warning( disable : 4244 ) 
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/string.hpp>
#include <boost/serialization/array.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#pragma warning (pop) 

struct Monkey
{
    uint32_t num;
    float* arr;

};


namespace boost
{
    namespace serialization
    {
        template<class Archive>
        void serialize(Archive & ar, Monkey& m, const unsigned int version)
        {
            ar & m.num;
            ar & make_array<float>(m.arr, m.num);
        }
    }
}


int _tmain(int argc, _TCHAR* argv[])
{
    const char* name = "monkey.txt";

    {
        Monkey m;
        m.num = 10;
        m.arr = new float[m.num];
        for (uint32_t index = 0; index < m.num; index++)
            m.arr[index] = (float)index;

        std::ofstream outStream(name, std::ios::out | std::ios::binary | std::ios::trunc);
        boost::archive::binary_oarchive oar(outStream);
        oar << (m);
    }

    Monkey m;
    std::ifstream inStream(name, std::ios::in | std::ios::binary);     
    boost::archive::binary_iarchive iar(inStream);
    iar >> (m);

    return 0;
}
KaiserJohaan
  • 9,028
  • 20
  • 112
  • 199

2 Answers2

11

I heartily recommend you use std::array or std::vector here, because... you messed this up :)

For starters, Monkey doesn't initialize its members. So, loading ends up doing a load_binary to whatever pointer value m.arr happened to have. How would you expect the deserialization to "know" that you needed to allocate memory for that? You need to tell it:

    template<class Archive>
    void serialize(Archive & ar, Monkey& m, const unsigned int version)
    {
        ar & m.num;
        if (Archive::is_loading::value)
        {
            assert(m.arr == nullptr);
            m.arr = new float[m.num];
        }
        ar & make_array<float>(m.arr, m.num);
    }

Now, let's make Monkey a bit less unsafe (by adding initialization and destruction, and, perhaps most importantly, prohibiting copy semantics):

struct Monkey
{
    uint32_t num;
    float* arr;

    Monkey() : num(0u), arr(nullptr) {}

    Monkey(Monkey const&) = delete;
    Monkey& operator=(Monkey const&) = delete;
    ~Monkey() { delete[] arr; }
};

Now, you can see it work:

#include <iostream>
#include <fstream>
#pragma warning(disable: 4244)
#include <boost/serialization/serialization.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>

struct Monkey
{
    uint32_t num;
    float* arr;

    Monkey() : num(0u), arr(nullptr) {}

    Monkey(Monkey const&) = delete;
    Monkey& operator=(Monkey const&) = delete;
    ~Monkey() { delete[] arr; }
};

namespace boost
{
    namespace serialization
    {
        template<class Archive>
        void serialize(Archive & ar, Monkey& m, const unsigned int version)
        {
            ar & m.num;
            if (Archive::is_loading::value)
            {
                assert(m.arr == nullptr);
                m.arr = new float[m.num];
            }
            ar & make_array<float>(m.arr, m.num);
        }
    }
}

int main(int argc, char* argv[])
{
    const char* name = "monkey.txt";
    {
        Monkey m;
        m.num = 10;
        m.arr = new float[m.num];
        for (uint32_t index = 0; index < m.num; index++)
            m.arr[index] = (float)index;

        std::ofstream outStream(name, std::ios::out | std::ios::binary | std::ios::trunc);
        boost::archive::binary_oarchive oar(outStream);
        oar << (m);
    }

    Monkey m;
    std::ifstream inStream(name, std::ios::in | std::ios::binary);
    boost::archive::binary_iarchive iar(inStream);
    iar >> (m);

    std::copy(m.arr, m.arr + m.num, std::ostream_iterator<float>(std::cout, ";"));
}

Prints

0;1;2;3;4;5;6;7;8;9;

Live on Coliru

aledalgrande
  • 5,167
  • 3
  • 37
  • 65
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    I started to write my own answer but @sehe put all the code and all the issues already. I can only add my summary, just rephrasing what is in here: your real problem is not the deserialization per se but the memory management. You did not specify how the struct Monkey allocates, copies or deallocates the memory it uses. Sehe provided two clean ways to solve this memory management issue (either use vector or add/delete copy ctor, dtor, etc.). – Michael Simbirsky Dec 15 '13 at 22:04
2

While deserializing, m.arr is not initialized to an array of 10 floats, but to a float*.

Make Monkey::arr an std::vector<float> instead of a float*. Boost serialization knows how to serialize and deserialize all containers from the C++ standard library.

Oswald
  • 31,254
  • 3
  • 43
  • 68
  • How/where do I allocate this memory, given that m.num could be any arbitary number that I dont know untill I deserialize it? – KaiserJohaan Dec 15 '13 at 17:02
  • I don't know. But you could make `Monkey::arr` an `std::vector` instead of a `float*`. Boost serialization knows how to serialize and deserialize all containers from the C++ standard library. – Oswald Dec 15 '13 at 17:06
  • I cant do that unfortunately; as I wrote before the code snippet, I am dealing with big amounts of data (lots of 3d meshes and image data) and it is too slow to deserialize it (minutes!) – KaiserJohaan Dec 15 '13 at 17:53
  • I cannot think of any reason why deserialising to an array should be any faster than deserializing to a `std::vector` (except for extreamly poor coding on the side of boost, but that's rather unlikey). If the code is fast at the moment, that is because you do not deserialize the whole array. – Oswald Dec 15 '13 at 22:33
  • I completely concur with Oswald. It's a myth that deserializing into a vector would be slower. A complete myth, since the library will _even_ guarantee that `.reserve(n)` is called with the actual dimensions, before loading the data. And POD types get loaded using an optimized copy (the equivalent of `memcpy`) – sehe Dec 16 '13 at 14:38
  • Reason for no vector was that I had issues in debug builds only with vector iterators that didnt seem to be fixed in VS2012 no matter if you tried to turn the debug flags off. Release builds were acceptible speeds – KaiserJohaan Dec 16 '13 at 14:49