2

I would like to import a pre-trained word2vec dictionary (in binary format) into spacy for vectorizing some text

I am able to import the vectors with gensim through:

import gensim 
model = gensim.models.KeyedVectors.load_word2vec_format('PubMed- 
shuffle-win-2.bin', binary=True)

Then I initialize a blank spacy nlp object and get the words associated with each index:

nlp = spacy.blank('en')
keys = []
for idx in range(len(model.index2word)):
keys.append(model.index2word[idx])`

Then set the vectors for the nlp object:

nlp.vocab.vectors = spacy.vocab.Vectors(data=model.syn0, keys=keys)

I am able to get to this stage without any problems. However, I was wondering how to save this nlp object and load it again into spacy to vectorize new text as efficiently as possible

Ferran
  • 840
  • 9
  • 18
  • This was answered here: [https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors](https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors) – Romain Nov 19 '19 at 12:28

0 Answers0