Import word2vec vectors in binary format into spacy

Asked Aug 16 '19 at 16:41

Active Sep 08 '19 at 06:17

Viewed 409 times

I would like to import a pre-trained word2vec dictionary (in binary format) into spacy for vectorizing some text

I am able to import the vectors with gensim through:

import gensim 
model = gensim.models.KeyedVectors.load_word2vec_format('PubMed- 
shuffle-win-2.bin', binary=True)

Then I initialize a blank spacy nlp object and get the words associated with each index:

nlp = spacy.blank('en')
keys = []
for idx in range(len(model.index2word)):
keys.append(model.index2word[idx])`

Then set the vectors for the nlp object:

nlp.vocab.vectors = spacy.vocab.Vectors(data=model.syn0, keys=keys)

I am able to get to this stage without any problems. However, I was wondering how to save this nlp object and load it again into spacy to vectorize new text as efficiently as possible

asked Aug 16 '19 at 16:41

Ferran

This was answered here: [https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors](https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors) – Romain Nov 19 '19 at 12:28

Import word2vec vectors in binary format into spacy

0 Answers0