I have a dataset representing a directed graph. The first column is the source node, the second column is the target node, and we can ignore the third column (essentially a weight). So for example:
0 1 3
0 13 1
0 37 1
0 51 1
0 438481 1
1 0 3
1 4 354
1 10 2602
1 11 2689
1 12 1
1 18 345
1 19 311
1 23 1
1 24 366
...
What I would like to do is append the out-degree for each node. For example, if I just added the out-degree for node 0, I would have:
0 1 3 5
0 13 1 5
0 37 1 5
0 51 1 5
0 438481 1 5
1 0 3
...
I have some code that does this, but it is extremely slow because I am using a for
loop:
import numpy as np
def save_degrees(X):
new_col = np.zeros(X.shape[0], dtype=np.int)
X = np.column_stack((X, new_col))
node_ids, degrees = np.unique(X[:, 0], return_counts=True)
# This is the slow part.
for node_id, deg in zip(node_ids, degrees):
indices = X[:, 0] == node_id
X[:, -1][indices] = deg
return X
train_X = np.load('data/train_X.npy')
train_X = save_degrees(train_X)
np.save('data/train_X_degrees.npy', train_X)
Is there a more efficient way to build this data structure?