I am trying to serialize a large (geometric) graph structure with the boost serialization library.
I store my graph as an adjacency list, that is, my structure is as follows:
class Node {
double x,y;
std::vector<Node*> adjacent_nodes;
...
}
class Graph {
std::vector<Node*> nodes;
...
}
Now with > 10k nodes my problem is, that when I start to serialize (save) my graph, it will recursively call the serialization of all of those nodes before returning, since the graph is connected.
To be more precise, when serializing the Graph it will start by serializing the first node in the "nodes" vector. While doing so it needs to serialize "adjacent_nodes" of the first nodes, where e.g. the second node is contained.
Therefore it needs to serialize the second node before returning the serialization of the first node and so on.
I found this thread from 2010, where someone explained the exact same problem. However, they did not come to a working solution there.
Any help would be greatly appreciated.
My structure in more detail:
class Node {
double x,y;
std::vector<Node*> adjacent_nodes;
public:
inline double get_x() const { return x; }
inline double get_y() const { return y; }
inline std::vector<Node*> const& get_adjacent_nodes() const { return adjacent_nodes; }
Node (double x, double y):x(x),y(y) {}
void add_adjacent(Node* other) {
adjacent_nodes.push_back(other);
}
private:
Node() {}
friend class boost::serialization::access;
template <class Archive>
void serialize(Archive &ar, const unsigned int) {
ar & x;
ar & y;
ar & adjacent_nodes;
}
};
class Simple_graph {
std::vector<Node*> nodes;
void add_edge(int firstIndex, int secondIndex) {
nodes[firstIndex]->add_adjacent(nodes[secondIndex]);
nodes[secondIndex]->add_adjacent(nodes[firstIndex]);
}
public:
/* methods to get the distance of points, to read in the nodes, and to generate edges */
~Simple_graph() {
for (auto node: nodes) {
delete node;
}
}
private:
friend class boost::serialization::access;
template <class Archive>
void serialize(Archive &ar, const unsigned int) {
ar & nodes;
}
};
Edit: To add some suggestions made in the above mentioned thread, citing Dominique Devienne:
1) save all the nodes without their topology info on a first pass of the vector, thus recording all the "tracked" pointers for them, then write the topology info for each, since then you don't "recurse", you only write a "ref" to an already serialized pointer.
2) have the possibility to write a "weak reference" to a pointer, which only adds the pointer to the "tracking" map with a special flag saying it wasn't "really" written yet, such that writing the topology of a node that wasn't yet written is akin to "forward references" to those neighboring nodes. Either the node will really be written later on, or it never will, and I suppose serialization should handle that gracefully.
#1 doesn't require changes in boost serialization, but puts the onus on the client code. Especially since you have to "externally" save the neighbors, so it's no longer well encapsulated, and writing a subset of the surface's nodes become more complex.
#2 would require seeking ahead to read the actual object when encountering a forward reference, and furthermore a separate map to know where to seek for it. That may be incompatible with boost serialization (I confess to be mostly ignorant about it).
Can any of those proposals be implemented by now?