4

I have a dataframe consisting of possible network connections in the format df = pd.DataFrame(["A", "B", "Count", "some_attribute"]). This dataframe represents connections like this:

  • A has a connection with B
  • This connection occurred "Count" times
  • This connection has a specific attribute (i.e. a specific type of contact)

I want to export this Dataframe to the graphml format. It works fine using the following code:

import networkx as nx
G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values)
nx.write_graphml(G, "my_graph.graphml")

This code results in a graphml file with the correct graph, which I can use with Gephi. Now I want to add an attribute:

G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values, attr=df["some_attribute"].values)
nx.write_graphml(G, "my_graph.graphml")

Whenever I try to add attributes in this code, it becomes impossible to write it to a graphml file. With this code, I get the following error message:

NetworkXError: GraphML writer does not support <class 'numpy.ndarray'> as data values.

I found related articles (like this one), but it didn't provide any solution for this problem. Does anyone have a solution for adding attributes to a graphml file using networkx so I can use them in Gephi?

Community
  • 1
  • 1
Guido
  • 6,182
  • 1
  • 29
  • 50
  • doesn't `attr=df["some_attribute"]` work? – EdChum Nov 01 '16 at 17:39
  • No, it doesn't. It will give the same error, except that numpy is replaced by Series. – Guido Nov 02 '16 at 10:46
  • It appears that 'some_attribute' is a field with type numpy.ndarray which is not a known graphml type. Is it an array or a single number? If it is a single number you could try to convert it to an integer or float first. Those types are both useable in graphml. – Aric Nov 08 '16 at 15:02
  • reshaping the data is probably your best bet, and/or serializing to string & deserializing if it's really supposed to be a list and not a simple type (you could try pickle or yaml if you want reconstitution, or json/msgpack should always work). – Corley Brigman Nov 08 '16 at 15:47
  • Could you give an example of what you mean? – Guido Nov 09 '16 at 10:09
  • Guido - I assume you're question was to @CorleyBrigman (if you don't include the name like I just did and it's not the person asking the question, he/she won't get any notices) – Joel Nov 11 '16 at 05:09
  • 2
    networks doesn't support lists as attributes. As @Kevin said (and I upvoted) `add_weighted_edges_from` assigns the same attributes to all edges; it doesn't unroll it. You could use `add_edges_from`, it takes a list of `(u, v, d)` where `d` is the attribute dictionary for that edge only. So you'd need something like `add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']] ])` (did not check code but it should be something like that) – Corley Brigman Nov 11 '16 at 17:10
  • Thanks @CorleyBrigman -- just need `.values` -- edited the answer to reflect your suggestion. – Kevin Nov 11 '16 at 18:03
  • As a sidenote: what also helped for me was to not encode the columns as UTF-8 strings – Guido Nov 14 '16 at 11:44

1 Answers1

2

Assuming the random DataFrame:

import pandas as pd
df = pd.DataFrame({'A': [0,1,2,0,0],
                   'B': [1,2,3,2,3],
                   'Count': [1,2,5,1,1],
                   'some_attribute': ['red','blue','red','blue','red']})

    A   B   Count  some_attribute
0   0   1   1   red
1   1   2   2   blue
2   2   3   5   red
3   0   2   1   blue
4   0   3   1   red

Following the code from above to instantiate a Graph:

import networkx as nx    
G = nx.Graph()
G.add_weighted_edges_from(df[["A","B", "Count"]].values, attr=df["some_attribute"].values)

when inspecting an edge, it appears that the numpy array, df['some_attribute'].values, gets assigned as an attribute to each edge:

print (G.edge[0][1])
print (G.edge[2][3])
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 1}
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 5}

If I understand your intent correctly, I'm assuming you want each edge's attribute to correspond to the df['some_attribute'] column.

You may find it easier to create your Graph using nx.from_pandas_dataframe(), especially since you already have data formatted in a DataFrame object.

G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])

print (G.edge[0][1])
print (G.edge[2][3])
{'Count': 1, 'some_attribute': 'red'}
{'Count': 5, 'some_attribute': 'red'}

writing to file was no problem:

nx.write_graphml(G,"my_graph.graphml")

except, I'm not a regular Gephi user so there may be another way to solve the following. When I loaded the file with 'Count' as the edge attribute, the Gephi graph didn't recognize edge weights by default. So I changed the column name from 'Count' to 'weight' and saw the following when I loaded into Gephi:

df.columns=['A', 'B', 'weight', 'some_attribute']
G = nx.from_pandas_dataframe(df, 'A', 'B', ['weight', 'some_attribute'])
nx.write_graphml(G,"my_graph.graphml")

enter image description here

Hope this helps and that I understood your question correctly.

Edit

Per Corley's comment above, you can use the following if you choose to use add_edges_from.

G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])

There is no significant performance gain, however I find from_pandas_dataframe more readable.

import numpy as np

df = pd.DataFrame({'A': np.arange(0,1000000),
                   'B': np.arange(1,1000001),
                   'Count': np.random.choice(range(10), 1000000, replace=True),
                   'some_attribute': np.random.choice(['red','blue'], 1000000, replace=True,)})

%%timeit
G = nx.Graph()
G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])

1 loop, best of 3: 4.23 s per loop

%%timeit
G = nx.Graph()
G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])

1 loop, best of 3: 3.93 s per loop
Kevin
  • 7,960
  • 5
  • 36
  • 57