I would like to import a pre-trained word2vec
dictionary (in binary format) into spacy
for vectorizing some text
I am able to import the vectors with gensim
through:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('PubMed-
shuffle-win-2.bin', binary=True)
Then I initialize a blank spacy nlp object and get the words associated with each index:
nlp = spacy.blank('en')
keys = []
for idx in range(len(model.index2word)):
keys.append(model.index2word[idx])`
Then set the vectors for the nlp object:
nlp.vocab.vectors = spacy.vocab.Vectors(data=model.syn0, keys=keys)
I am able to get to this stage without any problems. However, I was wondering how to save this nlp object and load it again into spacy to vectorize new text as efficiently as possible