I have a numpy ndarray (4 x 4) called 'sim' represents the similarity values between 4 items (a,b,c,d).
array([[ 1. , 0. , 0.5547002 , 0.73960026],
[ 0. , 1. , 0. , 0.66666667],
[ 0.5547002 , 0. , 1. , 0.33333333],
[ 0.73960026, 0.66666667, 0.33333333, 1. ]])
dataset_u is a list that contains [a,b,c,d] The following code sorts the array and then identifies the top-3 items (related_count) for each of the item a,b,c,d based on their simlarity values.
related_count =3
dataidx = np.asarray(dataset_u) # a,b,c,d
indices = np.argsort(-sim, axis=1)
result = np.hstack((dataidx[:, None], dataidx[indices]))
m1 = result.shape[0]
mask = np.c_[[True] * m1, result[:, 1:] != result[:, 0, None]]
final_mat = result[mask].reshape(m1, -1)
dfdownload = pd.DataFrame(final_mat[:, 1:related_count], index=final_mat[:, 0])
dfdownload:
How can I modify the above code so that it only consider the values >=0.5 before sorting the array? For example, for item 'a' the expected related items are 'd','c', whereas for items 'b' its related item is only 'd' (0.66666667).