I am using the Jaccard Coefficient to predict links in a newtork and then get the AUC score of my prediction. My code works, but each time it gives me a different score because each time it randomly chooses different nodes as the training set. Let's say I want to run 1000 prediction scores, and store them, and then get the average of those scores. What would I need to add/change to my code?
INPUT
#Remove 20% of the edges
proportion_edges=.2
edge_subset = random.sample(G.edges(), int(proportion_edges*G.number_of_edges()))
#Create a copy of the graph and remove the edges
G_train = G.copy()
G_train.remove_edges_from(edge_subset)
#Make prediction using Jaccard Coefficient
pred_jaccard = list(nx.jaccard_coefficient(G_train))
score_jaccard, label_jaccard = zip(*[(s, (u,v) in edge_subset) for (u,v,s) in pred_jaccard])
#Compute the ROC AUC Score for Jaccard Coefficient
from sklearn import metrics
from sklearn.metrics import roc_auc_score
fpr_jaccard, tpr_jaccard, _ = metrics.roc_curve(label_jaccard, score_jaccard)
auc_jaccard = roc_auc_score(label_jaccard, score_jaccard)
auc_jaccard
OUTPUT
0.6926406926406927