0

I'm writing scripts that read in graphs from gexf format, add nodes and edges, and write them back to gexf. My problem is that write_gexf is giving the edges that I add edge id's that already existed in the edges that I read in.

For instance, suppose I read in a graph G with a single edge.

>>> import networkx as nx
>>> G = nx.read_gexf('first.gexf')
>>> G.edges(data=True)
[(0,1, {'id': '0'})] 

and then I add an edge and write the graph to gexf:

>>> G.add_edge(1,2)
>>> G.edges(data=True)
[('0','1', {'id': '0'}), (1,2, {})]
>>> nx.write_gexf(G,'second.gexf')

Now if I read in 'second.gexf' I get two edges with 'id' equal '0'.

>>> H = nx.read_gexf('second.gexf')
>>> H.edges(data=True)
[('0','1', {'id': '0'}), ('1','2', {'id': '0'})]

Is there a way to avoid this?

Rob
  • 587
  • 5
  • 7

1 Answers1

1

The NetworkX GEXF writer generates an edge id - integers starting at 0 - if one is not specified. Since you added a second edge without an id (edge 1,2) an id of 0 was used which collides with your first edge id.

It might be a bug and certainly causes an issue with your use case. One workaround is to explicitly set an edge id when you add the node.

In [1]: import networkx as nx

In [2]: G = nx.read_gexf('first.gexf')

In [3]: G.edges(data=True)
Out[3]: [('1', '0', {'id': '0'})]

In [4]: G.add_edge(1,2,id=1)

In [5]: G.edges(data=True)
Out[5]: [('1', '0', {'id': '0'}), (2, 1, {'id': 1})]
Aric
  • 24,511
  • 5
  • 78
  • 77
  • This is a frustrating feature of of `write_gexf`. It seems like using json or even pickle is safer than constantly saving and writing graphs to gexf. – Rob Nov 13 '13 at 21:02
  • Unfortunately the GEXF format requires an edge id. So if you don't add one, somehow one has to be generated...It might be possible to keep track of the existing edge ids so there isn't a collision but I'm not sure if that can be done in one-pass. – Aric Nov 13 '13 at 22:20
  • Yeah. I guess this boils down to a feature I wish `write_gexf` handled so I didn't have to. – Rob Nov 15 '13 at 16:58