I am using TSNE to plot a trained word2vec model (created from gensim):
labels = []
tokens = []
for word in model.wv.vocab:
tokens.append(model[word])
labels.append(word)
tsne_model = TSNE(perplexity=40, n_components=2, init='pca', n_iter=2500, random_state=23)
new_values = tsne_model.fit_transform(tokens)
x = []
y = []
for value in new_values:
x.append(value[0])
y.append(value[1])
plt.figure(figsize=(50, 50))
for i in range(len(x)):
plt.scatter(x[i],y[i])
plt.annotate(labels[i],
xy=(x[i], y[i]),
xytext=(5, 2),
textcoords='offset points',
ha='right',
va='bottom')
plt.show()
Like as the inbuilt gensim method 'most_similar', per ex.
w2v_model.wv.most_similar(postive=['word'], topn=20)
will output 20 of the most similar words to 'word', I will like to plot only the most similar words (n=20) of a given word. Any advice on how to modify the plot to do that?