I'm trying to calculate Adamic-Adar similarity for a network, which have two types of nodes. I'm only interested in calculating similarity between nodes which have outgoing connections. Nodes with incoming connections are a kind of connector and I'm not interested in them.
Data size and characteristic:
> summary(g)
IGRAPH DNW- 3852 24478 --
+ attr: name (v/c), weight (e/n)
Prototype code in Python 2.7:
import glob
import os
import pandas as pd
from igraph import *
os.chdir("data/")
for file in glob.glob("*.graphml"):
print(file)
g = Graph.Read_GraphML(file)
indegree = Graph.degree(g, mode="in")
g['indegree'] = indegree
dev = g.vs.select(indegree == 0)
m = Graph.similarity_inverse_log_weighted(dev.subgraph())
df = pd.melt(m)
df.to_csv(file.split("_only.graphml")[0] + "_similarity.csv", sep=',')
There is something wrong with this code, because dev
is of length 1
, and m
is 0.0
, so it doesn't work as expected.
Hint
I have a working code in R, but seems like I'm unable to rewrite it to Python (which I'm doing for the sake of performance, networks are huge). Here it is:
# make sure g is your network
indegree <- degree(g, mode="in")
V(g)$indegree <- indegree
dev <- V(g)[indegree==0]
m <- similarity.invlogweighted(g, dev)
x.m <- melt(m)
colnames(x.m) <- c("dev1", "dev2", "value")
x.m <- x.m[x.m$value > 0, ]
write.csv(x.m, file = sub(".csv",
"_similarity.csv", filename))