1

I tried to load .bin embedding file using gensim but i got errors. I tried all the methods provided by gensim but couldn't rectify the error

Method 1

import gensim.models.keyedvectors as word2vec

model=word2vec.KeyedVectors.load_word2vec_format('Health_2.5reviews.s200.w10.n5.v10.cbow.bin', binary=True, unicode_errors=‘ignore')

Method 2

from gensim.models import KeyedVectors

filename='Health_2.5reviews.s200.w10.n5.v10.cbow.bin'

model=KeyedVectors.load_word2vec_format(filename,binary=True,unicode_errors=‘ignore')

Method 1 and 2 gave the error

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte"

Method 3

from gensim.models import Word2Vec

filename='Health_2.5reviews.s200.w10.n5.v10.cbow.bin'

model=Word2Vec.load(filename)

Method 3 gave the error

UnpicklingError: invalid load key, '\xbc'.

Mr. NLP
  • 891
  • 1
  • 8
  • 20
  • Possible duplicate of [UnicodeDecodeError error when loading word2vec](https://stackoverflow.com/questions/50573054/unicodedecodeerror-error-when-loading-word2vec) – amanb Mar 25 '19 at 07:02
  • I tried Word2Vec.load() method also, but I got another error **UnpicklingError: invalid load key, '\xbc'.** – Mr. NLP Mar 25 '19 at 11:25
  • Refer: https://stackoverflow.com/questions/44022180/unpickling-error-while-using-word2vec-load – amanb Mar 25 '19 at 12:08
  • Thanks @amanb. I will check. – Mr. NLP Mar 25 '19 at 14:57
  • How were the models saved, or from where were they sourced? Are you sure the models haven't been corrupted/truncated? – gojomo Mar 25 '19 at 16:03
  • @gojomo I think, the model is corrupted. I'm in the process of generating embeddings again with gensim. – Mr. NLP Mar 26 '19 at 02:41

0 Answers0