Scale and computer cosine similarity of co-occurrence matrix

Asked Feb 22 '17 at 08:27

Active Feb 22 '17 at 08:27

Viewed 771 times

I have a co-occurrence symmetric matrix (1877 x 1877). I treat columns as features and compute the cosine distance between them. Before that, I scale the matrix (center to the mean and component wise scale to unit variance).

from sklearn import preprocessing
from sklearn.metrics import pairwise_distances
X_scaled = preprocessing.scale(mymatrix)
dist = pairwise_distances(X_scaled,metric="cosine")

My questions:

Should I scale the co-occurrence data before computing the cosine distance/sim? The figure below shows the histograms of the actual matrix. The x-axis represents co-occurrence values in the matrix, and y-axis indicates the number of times they appear in the matrix.
The code above returns distance > 1 and distance < 0. How can I ensure that the cosine distance values between 0 and 1? Should I apply min max scaler over the dist matrix?

asked Feb 22 '17 at 08:27

kitchenprinzessin

1,023
3
14
30

Scale and computer cosine similarity of co-occurrence matrix

0 Answers0