I have been looking at this for the past hour but can not seem to find the problem... I have a list of articles on which I want to see which articles are similar to each other.
I have done this by computing the cosine similarities between the TF-IDF vectors of the articles and making a t-SNE plot of the result. I have done this in 2 ways but what surprised me is that the plots are very different from each other, and I do not see which one is correct.
In the examples, tfdoc is the TF-IDF.
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import manifold
X = cosine_similarity(tfdoc, tfdoc)
model = manifold.TSNE(random_state=1, metric="precomputed")
Y = model.fit_transform(X)
when plotted, this results in:
But when I use this code:
from sklearn.manifold import TSNE
tsne = TSNE(random_state=1, metric="cosine")
embs = tsne.fit_transform(tfdoc)
It results in:
Does someone know what the difference here exactly is?
Thanks in advance!!