0

I'm doing a Social Network Analysis of this dataset with NetworkX and I want to make a degree and closeness centrality analysis.

The graph I obtain is undirected (graph.is_directed() returns false), and I have node 1 with degree 593 but it has 0 as target and weight on the edges csv. The graph is undirected so I expect node 1 to be the central node but it's not and I don't get why (the dataset is based on the animated series The Simpson so I bet I know who is the most central character).

I'm afraid the analysis ends up unreliable this way.

---edit

This is the code where I import and create the graph.

dfN=pd.read_csv('gdrive/My Drive/SNA/simpsonsNodes.csv')
dfE=pd.read_csv('gdrive/My Drive/SNA/simpsonsEdges.csv')
df = pd.merge(left=dfN, right=dfE, left_on="Id", right_on='Source', how='outer').drop(['Id', 'Type'], axis=1)
df.columns = ['Name', 'Source', 'Target', 'Weight']
df = df.fillna(0)
df = df.astype({'Source':'int', 'Target':'int', 'Weight':'int'})
df

graph = nx.from_pandas_edgelist(df, 'Source', 'Target', edge_attr='Weight', create_using=nx.Graph() )
print(graph.is_directed())

I dropped two columns: id because is redundant and Type because it's "Undirected" for every row so I don't really need it.

I used df = df.fillna(0) because node with id = 1 had source, target and weight as NaN so I converted it and used df.loc[0,"Source"]=1 to insert 1 as its source.

Fio
  • 43
  • 5
  • The fact that the graph is undirected and node 1 has no outgoing edges seems contradictory... – SultanOrazbayev Sep 06 '22 at 13:16
  • 1
    The nodes of an undirected do NOT have incoming and outgoing edges. The edges are all undirected. – ravenspoint Sep 06 '22 at 13:40
  • Yes I know, I mean the dataset has source, target and weight columns, node 1 has target and weight 0, it shouldn’t affect the analysis because, again, is_directed() returns False but apparently it does and I don’t understand why – Fio Sep 06 '22 at 15:15
  • I edited the question so maybe now it’s not ambiguous – Fio Sep 06 '22 at 15:24
  • I haven't loaded it to check for sure, but a scroll through the edges csv suggests there aren't any edges listed with target _or_ weight equal to zero. The node labels start at 1, and the weights are positive integers (episode co-appearance counts?). – Ben Reiniger Sep 06 '22 at 17:05
  • In the nodes.csv there's a node which id is 1 but in the edges.csv it only shows as target, so when I build the dataframe for the graph and I combine nodes and edges as name, source, target and weight columns, node 1 ends up having source, target and weight as NaN, so I just convert NaN to 0 and set source as 1 (because I know is its id), so basically in edges.csv there's no info on node 1 except as a target of others nodes – Fio Sep 06 '22 at 19:24
  • @Fio but it's meant to be an undirected graph. The edges file just seems to use the convention of `source>target`. Node 1 has lots of edges, just only ever with it as the `target`. I suspect your ingestion from the csv to the indirect graph is incorrect somewhere. Can you provide that part of the code? – Ben Reiniger Sep 06 '22 at 20:06
  • @BenReiniger I edited the question adding the code, I thought the same about the conversion but it still gives me false with .is_directed and it doesn't changes if I use to_undirected() – Fio Sep 06 '22 at 20:29
  • I've made an answer about the graph ingestion, and ran betweenness_centrality and got reasonably expected results. Were your unintuitive results from a different centrality measure? – Ben Reiniger Sep 06 '22 at 22:21

1 Answers1

1

The Homer as a NaN edge is an artifact of the outer merge and the asymmetry of the edges file: Homer appears as node 1, so appears as a target plenty often, but never as a source. Dropping the artifact row, or doing an inner merge, should take care of that. As it stands, you artificially created a new node labeled 0 with just the one edge joining it to Homer; but since it has weight 0 also, it probably won't affect any algorithms that take weight into account.

Doing the merge at all is a little odd to me: you only end up with names associated with the sources, and not the targets. Anyway, they don't modify the graph in a significant way.

I ran the betweenness centrality algorithm (oops, without weights), and Homer does indeed end up the most central, with a relative score of 0.259 (Marge and Bart at 0.177 and 0.176, then Lisa at 0.155, then a sharp dropoff to 0.022 for...Lenny?...). With weights, the scores are changed, but the order among the top four is the same.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • Ok thank you, I get what the problem is but I'm still not able to fix it: how can I take char_names and link them to the nodes to display them? It was the main reason why I (wrongly) made the merge in first place – Fio Sep 07 '22 at 03:00
  • @Fio The merging almost worked, you just need to do another to add the target names, and use inner instead of outer. – Ben Reiniger Sep 07 '22 at 13:57