I'm looking for a way to dinamically add pre-trained word vectors to a word2vec gensim model.
I have a pre-trained word2vec model in a txt (words and their embedding) and I need to get Word Mover's Distance (for example via gensim.models.Word2Vec.wmdistance) between documents in a specific corpus and a new document.
To prevent the need to load the whole vocabulary, I would want to load only the subset of the pre-trained model's words that are found in the corpus. But if the new document has words that are not found in the corpus but they are in the original model vocabulary add them to the model so they are considered in the computation.
What I want is to save RAM, so possible things that would help me:
- Is there a way to add the word vectors directly to the model?
- Is there a way to load to gensim from a matrix or another object? I could have that object in RAM and append to it the new words before loading them in the model
- I don't need it to be on gensim, so if you know a different implementation for WMD that gets the vectors as input that would work (though I do need it in Python)
Thanks in advance.