15

I want to use a pre-trained word2vec model, but I don't know how to load it in python.

This file is a MODEL file (703 MB). It can be downloaded here:
http://devmount.github.io/GermanWordEmbeddings/

Abdulrahman Bres
  • 2,603
  • 1
  • 20
  • 39
Vahid SJ
  • 383
  • 1
  • 2
  • 12

4 Answers4

28

just for loading

import gensim

# Load pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load("modelName.model")

now you can train the model as usual. also, if you want to be able to save it and retrain it multiple times, here's what you should do

model.train(//insert proper parameters here//)
"""
If you don't plan to train the model any further, calling
init_sims will make the model much more memory-efficient
If `replace` is set, forget the original vectors and only keep the normalized
ones = saves lots of memory!
replace=True if you want to reuse the model
"""
model.init_sims(replace=True)

# save the model for later use
# for loading, call Word2Vec.load()

model.save("modelName.model")
AbtPst
  • 7,778
  • 17
  • 91
  • 172
  • I get this error: File "C:\...\Python\Python35\lib\site-packages\gensim\utils.py", line 911, in unpickle return _pickle.loads(f.read()) _pickle.UnpicklingError: invalid load key, '6'. – Vahid SJ Oct 01 '16 at 16:47
  • 1
    `_pickle.UnpicklingError: invalid load key, '3'. ` Looks like in some cases `.load_word2vec_format()` can help. – mrgloom Sep 25 '17 at 21:52
  • 1
    `gensim.models.KeyedVectors.load_word2vec_format` works fine – beyondfloatingpoint Jul 16 '19 at 15:38
6

Use KeyedVectors to load the pre-trained model.

from gensim.models import KeyedVectors
from gensim import models

word2vec_path = 'path/GoogleNews-vectors-negative300.bin.gz'
w2v_model = models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
2

I used the same model in my code and since I couldn't load it, I asked the author about it. His answer was that the model has to be loaded in binary format:

gensim.models.KeyedVectors.load_word2vec_format(w2v_path, binary=True)

This worked for me, and I think it should work for you, too.

0

I met the same issue and I downloaded GoogleNews-vectors-negative300 from Kaggle. I saved and extracted the file in my descktop. Then I implemented this code in python and it worked well:

model = KeyedVectors.load_word2vec_format=(r'C:/Users/juana/descktop/archive/GoogleNews-vectors-negative300.bin')