3

I am doing a project where I need a pre-trained vector of the skip-gram model. I heard that there is also a variant named skip-n-gram model which gives better result.

I am wondering what do I need to train the models myself? Since I just need them to initialize the embedding layer for my model.

I have searched enough but didn't get good examples. I need suggestion from you. Where can I get such pre-trained model or there is no pre-trained model for this.

Harman
  • 1,168
  • 1
  • 9
  • 19
Maruf
  • 792
  • 12
  • 36

1 Answers1

6

You can train our own word-vectors if you have enough data with you. This can be done using gensim. They provide very simple yet powerful APIs for topic modeling.

But if you want to use already trained word2vec models, you can use the word2vec model released by Google. It’s 1.5GB and includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset.

You can load this model with the gensim. Download the trained word2vec model and use following code to get started.

import math
import sys
import gensim
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim') 

from gensim.models.keyedvectors import KeyedVectors

words = ['access', 'aeroway', 'airport']

# load the model
model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)

# to extract word vector
print(model[words[0]])  # vector representing access

Result vector:

[ -8.74023438e-02  -1.86523438e-01 .. ]

Please note that your system may freeze while loading of such huge model.

Harman
  • 1,168
  • 1
  • 9
  • 19
  • No @MSaifulBari I don't have much idea. But I would suggest you look at [fasttext](https://fasttext.cc) once. I think they have implemented n-gram models. – Harman Oct 23 '17 at 13:09
  • 1
    @Digao each word is translated into an n-dimensional embedding vector where each row can be understood as a feature. Similar (or semantically similar) words will have these features very near to each other. You can find this out by calculating the cosine similarity between different word vectors. – Harman Apr 18 '18 at 11:05
  • @Harman but how about the columns of this vector ? Is the number of columns the same number as the words in the training ( in this case 100 billion ) ? I mean, does the first number in this vector (-8.74023438e-02) refer to the probability of the first word in the training to belong to the skip-gram ? – Digao Apr 18 '18 at 12:05
  • 2
    @Digao each word embedding is a vector of size 300*1. Hence, it will just have feature information for that particular word. here, (-8.74023438e-02) doesn't refer to the probability, it's one of the 300 features. This is a good [tutorial](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) to look at if you're interested. – Harman Apr 19 '18 at 07:35
  • 1
    according to the link you sent, the vector would be 1*300, not 300*1, right. And the vector @MSaifulBari is looking for is the middle layer of the network right ? – Digao Apr 19 '18 at 11:17