1

I'm trying to keep objects including vectors of objects in a binary file.

Here's a bit of the load from file code:

template <class T> void read(T* obj,std::ifstream * file) {
    file->read((char*)(obj),sizeof(*obj));
    file->seekg(int(file->tellg())+sizeof(*obj));
}

void read_db(DB* obj,std::ifstream * file) {
    read<DB>(obj,file);
    for(int index = 0;index < obj->Arrays.size();index++) {
        std::cin.get(); //debugging
        obj->Arrays[0].Name = "hi"; //debugging
        std::cin.get(); //debugging
        std::cout << obj->Arrays[0].Name;
        read<DB_ARRAY>(&obj->Arrays[index],file);
        for(int row_index = 0;row_index < obj->Arrays[index].Rows.size();row_index++) {
            read<DB_ROW>(&obj->Arrays[index].Rows[row_index],file);
            for(int int_index = 0;int_index < obj->Arrays[index].Rows[row_index].i_Values.size();int_index++) {
                read<DB_VALUE<int>>(&obj->Arrays[index].Rows[row_index].i_Values[int_index],file);
            }
        }
    }
}

And here's the DB/DB_ARRAY classes

class DB {
public:
    std::string Name;
    std::vector<DB_ARRAY> Arrays;
    DB_ARRAY * operator[](std::string);
    DB_ARRAY * Create(std::string);
};
class DB_ARRAY {
public:
    DB* Parent;
    std::string Name;
    std::vector<DB_ROW> Rows;
    DB_ROW * operator[](int);
    DB_ROW * Create();
    DB_ARRAY(DB*,std::string);
    DB_ARRAY();
};

So now the first argument to the read_db function would have correct values, and the vector Arrays on the object has the correct size, However if I index any value of any object from obj->Arrays it's going to throw the access violation exception.

std::cout << obj->Arrays[0].Name; // error
std::cout << &obj->Arrays[0]; // no error

The later always prints the same address, so when I save an object casted to char* does it save the address of it too?

Coding Mash
  • 3,338
  • 5
  • 24
  • 45
  • Ever considered using boost::serialization for doing this ? – count0 Oct 11 '12 at 14:49
  • 1
    The `read` function alone is hair-raising on two counts: 1) you shouldn't seek, since reading already advances the read pointer. 2) You shouldn't serialize C++ objects by just dumping their binary representation. – Kerrek SB Oct 11 '12 at 14:51
  • How should I serialize it then? – user1499944 Oct 11 '12 at 14:53
  • To elaborate on Kerrek's point: the reason you shouldn't serialize C++ objects (in particular, any non-POD type, or any type containing pointers/references) is that when you serialize a pointer, you just serialize the raw pointer value, not the actual data being pointed to. When you load up that pointer again, it won't point to the same thing, since processes run in different virtual address spaces. Non-POD types can also contain unexpected data in them like vtables and virtual base class pointers. – Adam Rosenfield Oct 11 '12 at 15:02

2 Answers2

0

As various commenters pointed out, you cannot simply serialize a (non-POD) object by saving / restoring it's memory.

The usual way to implement serialization is to implement a serialization interface on the classes. Something like this:

struct ISerializable {
   virtual std::ostream& save(std::ostream& os) const = 0;
   virtual std::istream& load(std::istream& is) = 0;
};

You then implement this interface in your serializable classes, recursively calling save and load on any members referencing other serializable classes, and writing out any POD members. E.g.:

class DB_ARRAY : public ISerializable {
public:
    DB* Parent;
    std::string Name;
    std::vector<DB_ROW> Rows;
    DB_ROW * operator[](int);
    DB_ROW * Create();
    DB_ARRAY(DB*,std::string);
    DB_ARRAY();

   virtual std::ostream& save(std::ostream& os) const
   {
       // serialize out members
       return os;
   }

   virtual std::istream& load(std::istream& is)
   {
       // unserialize members
       return os;
   }
};

As count0 pointed out, boost::serialization is also a great starting point.

Daniel Gehriger
  • 7,339
  • 2
  • 34
  • 55
  • But can I still serialize the member types of the object (std::string,int,etc) by just casting them to char*? – user1499944 Oct 11 '12 at 15:42
  • @user1499944 - no, the rule that you can't serialize non-PODS by saving and restoring their memory applies recursively. – Pete Becker Oct 11 '12 at 15:45
  • @user1499944: as Pete said, no. In fact, you cannot even serialize a simple `char*` member just by copying it to the stream. You would only store the pointer value, not the actual data it points to! The correct way would be to store the character string referenced by the `char*` (or the `std::string`). Even then, just copying the memory referenced by the `char*` up to the first `\0` character may not be sufficient - it could be using [double-nulls](http://goo.gl/UVlkR)!A more sensible way is to store it as a Pascal string, that is, first storing the string length, followed by the actual data. – Daniel Gehriger Oct 11 '12 at 20:01
0

What is the format of the binary data in the file? Until you specify that, we can't tell you how to write it. Basically, you have to specify a format for all of your data types (except char), then write the code to write out that format, byte by byte (or generate it into a buffer); and on the other side, to read it in byte by byte, and reconstruct it. The C++ standard says nothing (or very little) about the size and representation of the data types, except that sizeof(char) must be 1, and that unsigned char must be a pure binary representation over all of the bits. And on the machines I have access today (Sun Sparc and PC's), only the character types have a common representation. As for the more complex types, the memory used in the value representation might not even be contiguous: the bitwise representation of an std::vector, for example, is usually three pointers, with the actual values in the vector being found somewhere else entirely.

The functions istream::read and ostream::write are designed for reading data into a buffer for manual parsing, and writing a pre-formatted buffer. The fact that you need to use a reinterpret_cast to use them otherwise should be a good indication that it won't work.

James Kanze
  • 150,581
  • 18
  • 184
  • 329