I have a dataframe of users and their friends that looks like:
user_id | friend_id
1 3
1 4
2 3
2 5
3 4
I want to write a function in python
to compute the number of mutual friends for each pair:
user_id | friend_id | num_mutual
1 3 1
1 4 1
2 3 0
2 5 0
3 4 1
Currently I have:
def find_mutual(df):
num_mutual = []
for i in range(len(df)):
user, friend = df.loc[i, 'user_id'], df.loc[i, 'friend_id']
user_list = df[df.user_id == user].friend_id.tolist() + df[df.friend_id == user].user_id.tolist()
friend_list = df[df.user_id == friend].friend_id.tolist() + df[df.friend_id == friend].user_id.tolist()
mutual = len(list(set(user_list) & set(friend_list)))
num_mutual.append(mutual)
return num_mutual
It works fine for small datasets, but I'm running it on a dataset with millions of rows. It takes forever to run everything. I know it's not the ideal way to find the count. Is there a better algorithm in Python? Thanks in advance!