Two ways of computing t-SNE plot with cosine similarity ends in different plots, but the method seems the same

Question

I have been looking at this for the past hour but can not seem to find the problem... I have a list of articles on which I want to see which articles are similar to each other.

I have done this by computing the cosine similarities between the TF-IDF vectors of the articles and making a t-SNE plot of the result. I have done this in 2 ways but what surprised me is that the plots are very different from each other, and I do not see which one is correct.

In the examples, tfdoc is the TF-IDF.

from sklearn.metrics.pairwise import cosine_similarity
from sklearn import manifold

X = cosine_similarity(tfdoc, tfdoc)
model = manifold.TSNE(random_state=1, metric="precomputed")
Y = model.fit_transform(X)

when plotted, this results in:

But when I use this code:

from sklearn.manifold import TSNE

tsne = TSNE(random_state=1, metric="cosine")

embs = tsne.fit_transform(tfdoc)

It results in:

Does someone know what the difference here exactly is?

Thanks in advance!!

score 0 · Answer 1 · answered Jun 14 '22 at 19:35

0

The first test uses cosine-similarity, whereas the second uses cosine-distance. Normally, larger cosine distances means smaller cosine similarity.

answered Jun 14 '22 at 19:35

James LI

133
1
8

Two ways of computing t-SNE plot with cosine similarity ends in different plots, but the method seems the same

1 Answers1