I'm PhD student in digital humanities. I'm quite new to programming languages.
I have a problem that is freaking me out since last month. I'm trying to visualize a doc2vec model (python, gensim library) on the embeddings projector in Tensorboard but I'm not getting what I expect.
I'm sure that I'm missing out something really basic here...however, summing up
- If I pick up a random vector in Tensorboard the most similar vectors are completely different than in my model. Is that because of the dimensionality reduction or what?
- A lot of vectors have cosine similarity that is higher than one and I really don't understand what I'm doing wrong here. Someone told me that maybe my vectors are not normalized but I think Gensim does it already, doesn't it?
Here is the code I'm using to generate the embeddings. I tried also to change a bit the code, taking the vectors directly from "KeyedVectors" but nothing changed.
from gensim.scripts import word2vec2tensor
from gensim.models.doc2vec import Doc2Vec
doc2vec_model = Doc2Vec.load("doc2vec4.d2v")
doc2vec_model.save_word2vec_format('doc_tensor.w2v', doctag_vec=True, word_vec=False)
%run "C:..word2vec2tensor.py" -i doc_tensor.w2v -o my_plot
What I'm doing wrong here? Thanks in advance.