Why the code I used to load pre-trained word2vec from google news corpus (English) is not working to load the google news corpus (Indonesian)?

Question

I am working in a cross-cultural language study involving English and Indonesian participants.

In the English participants, I successfully load the pre-trained word2vec from google news corpus (file: GoogleNews-vectors-negative300.bin).

I was wondering because I cannot load the google news corpus for the Indonesian language. (file: id.bin, file source: https://github.com/Kyubyong/wordvectors).

Here is the working code:

import gensim
from gensim import models
from gensim.models import Word2Vec
import math
import sys
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

model = gensim.models.word2vec.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

Here is the not working code:

import gensim
from gensim import models
from gensim.models import Word2Vec
import math
import sys
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

model = gensim.models.word2vec.Word2Vec.load_word2vec_format('id.bin', binary=True)

What is the correct way to do this?

score 0 · Answer 1 · answered Nov 06 '18 at 14:18

0

You should use load() instead of load_word2vec_format(). load_word2vec_format is for the model generated by google, not for the model generated by gensim.

import gensim

model = gensim.models.word2vec.Word2Vec.load('id.bin')

answered Nov 06 '18 at 14:18

Rob Bricheno

4,467
15
29

Why the code I used to load pre-trained word2vec from google news corpus (English) is not working to load the google news corpus (Indonesian)?

1 Answers1