19

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.

gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>

import spacy
spacy.load(gensimmodel)

OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Subigya Upadhyay
  • 266
  • 1
  • 2
  • 11
  • The binary solution has been answered here: [https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors](https://stackoverflow.com/questions/42094180/spacy-how-to-load-google-news-word2vec-vectors) – Romain Nov 19 '19 at 12:30

3 Answers3

26

Train and save your model in plain-text format:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec

path = get_tmpfile("./data/word2vec.model")

model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")

Gzip the text file:

gzip word2vec.txt

Which produces a word2vec.txt.gz file.

Run the following command:

python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz

Load the vectors using:

nlp = spacy.load('./data/spacy.word2vec.model/')
Paavo Pohndorff
  • 323
  • 1
  • 2
  • 17
hbot
  • 724
  • 8
  • 19
  • 2
    The last command didn't work for me, since spacy interpreted the 'en' parameter as filepath. What worked was simply running `nlp = spacy.load('./data/spacy.word2vec.model/')` as suggested in [spacy docs](https://spacy.io/models) – mrapacz May 31 '19 at 18:15
  • The bridge that solved my problem was the line `model.wv.save_word2vec_format("./data/word2vec.txt")` – Chris Mar 06 '20 at 13:55
  • doesn't work for me! I follow your steps but get the follwoing error when runnign the `python -m spacy ...` command: `FileNotFoundError: [Errno 2] No such file or directory: 'data/spacy.word2vec.model'` – Matt Nov 03 '20 at 19:28
  • Fixed it by changing the path here: `w2v_model.wv.save_word2vec_format("word2vec.txt", binary=False)` and by adjusting the spacy command to reflect the change in path: `python3 -m spacy init-model en spacy.word2vec.model --vectors-loc word2vec.txt.gz`. I then read in the standard model along with my new vectors: `nlp = spacy.load('en_core_web_sm', vectors='spacy.word2vec.model')` – Matt Nov 04 '20 at 08:34
  • the 'init-model' flag was changed to init, see doc https://spacy.io/api/cli#init-model – alex Mar 19 '21 at 22:26
4

As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz

then you can load you model, nlp = spacy.load('your_model') and use it!

Also see the similar question that answered here.

Ali Zarezade
  • 871
  • 9
  • 22
2

All of these answers are for an older version of spacy. In the latest version the command is changed to:

python -m spacy init vectors [OPTIONS] LANG VECTORS_LOC OUTPUT_DIR

you can learn more about options by typing python -m spacy init --help in your command prompt