0

I am working in a cross-cultural language study involving English and Indonesian participants.

In the English participants, I successfully load the pre-trained word2vec from google news corpus (file: GoogleNews-vectors-negative300.bin).

I was wondering because I cannot load the google news corpus for the Indonesian language. (file: id.bin, file source: https://github.com/Kyubyong/wordvectors).

Here is the working code:

import gensim
from gensim import models
from gensim.models import Word2Vec
import math
import sys
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

model = gensim.models.word2vec.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

Here is the not working code:

import gensim
from gensim import models
from gensim.models import Word2Vec
import math
import sys
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

model = gensim.models.word2vec.Word2Vec.load_word2vec_format('id.bin', binary=True)

What is the correct way to do this?

1 Answers1

0

You should use load() instead of load_word2vec_format(). load_word2vec_format is for the model generated by google, not for the model generated by gensim.

import gensim

model = gensim.models.word2vec.Word2Vec.load('id.bin')
Rob Bricheno
  • 4,467
  • 15
  • 29