Questions tagged [word2vec]

This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.

Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are.

This has a very broad range of potential applications: knowledge representation and extraction; machine translation; question answering; conversational systems; and many others.

The original paper by Mikolov et. al. can be found on arxiv.

2274 questions

145

votes

14 answers

How to calculate the sentence similarity using word2vec model of gensim with python

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity('woman', 'man') 0.73723527 However, the word2vec model fails to predict the sentence…

python gensim word2vec

asked Mar 02 '14 at 16:04

zhfkt

2,415
3
21
24

117

votes

3 answers

word2vec: negative sampling (in layman term)?

I'm reading the paper below and I have some trouble , understanding the concept of negative sampling. http://arxiv.org/pdf/1402.3722v1.pdf Can anyone help , please?

machine-learning nlp word2vec

asked Jan 09 '15 at 12:31

Andy K

4,944
10
53
82

votes

9 answers

How to get vector for a sentence from the word2vec of tokens in sentence

I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of the tokens in the sentence.

word2vec

asked Apr 21 '15 at 00:46

trialcritic

1,225
1
10
14

votes

10 answers

Convert word2vec bin file to text

From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly straightforward to convert the binary format to text…

python c gensim word2vec

asked Dec 05 '14 at 20:39

Glenn

6,455
4
33
42

votes

3 answers

What is a projection layer in the context of neural networks?

I am currently trying to understand the architecture behind the word2vec neural net learning algorithm, for representing words as vectors based on their context. After reading Tomas Mikolov paper I came across what he defines as a projection layer.…

machine-learning nlp neural-network word2vec

asked Jun 17 '16 at 20:30

Roger

1,053
1
8
14

votes

4 answers

Doc2vec: How to get document vectors

How to get document vectors of two text documents using Doc2vec? I am new to this, so it would be helpful if someone could point me in the right direction / help me with some tutorial I am using gensim. doc1=["This is a sentence","This is another…

python gensim word2vec

asked Jul 09 '15 at 14:57

bee2502

1,145
1
10
13

votes

3 answers

CBOW v.s. skip-gram: why invert context and target words?

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it produces, the content of the X and Y pair seems to be…

nlp tensorflow deep-learning word2vec word-embedding

asked Jul 10 '16 at 01:21

Guillaume Chevalier

9,613
8
51
79

votes

5 answers

gensim word2vec: Find number of words in vocabulary

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?

python neural-network nlp gensim word2vec

asked Feb 24 '16 at 07:39

hlin117

20,764
31
72
93

votes

5 answers

How can a sentence or a document be converted to a vector?

We have models for converting words to vectors (for example the word2vec model). Do similar models exist which convert sentences/documents into vectors, using perhaps the vectors learnt for the individual words?

vector nlp word2vec

asked Jun 12 '15 at 05:36

Sahil

1,346
1
12
17

votes

5 answers

How to use word2vec to calculate the similarity distance by giving 2 words?

Word2vec is a open source tool to calculate the words distance provided by Google. It can be used by inputting a word and output the ranked word lists according to the similarity. E.g. Input: france Output: Word Cosine distance …

word2vec

asked Feb 24 '14 at 05:58

zhfkt

2,415
3
21
24

votes

18 answers

gensim error: ImportError: No module named 'gensim'

I trying to import gensim with import gensim but get the following error ImportError Traceback (most recent call last) in () ----> 1 import gensim 2 model =…

python gensim word2vec

asked Sep 12 '17 at 05:33

woojung

votes

4 answers

How to use Gensim doc2vec with pre-trained word vectors?

I recently came across the doc2vec addition to Gensim. How can I use pre-trained word vectors (e.g. found in word2vec original website) with doc2vec? Or is doc2vec getting the word vectors from the same sentences it uses for paragraph-vector…

python nlp gensim word2vec doc2vec

asked Dec 14 '14 at 15:13

Stergios

3,126
6
33
55

votes

8 answers

How to check if a key exists in a word2vec trained model or not

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view". myModel["view"] However, I get a KeyError for the word…

python gensim word2vec

asked May 18 '15 at 11:24

London guy

27,522
44
121
179

votes

4 answers

TensorFlow 'module' object has no attribute 'global_variables_initializer'

I'm new to Tensorflow I'm running a Deep learning Assignment from Udacity on iPython notebook. link And it has an error. AttributeError Traceback (most recent call last) `` in ``() …

python tensorflow deep-learning word2vec

asked Nov 09 '16 at 16:20

Le D. Thang

votes

6 answers

Update gensim word2vec model

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the model with that sentence so that querying it next…

gensim word2vec

asked Mar 01 '14 at 22:08

user2480542

2,845
4
24
25

2 3

…

99 100 Next