Questions tagged [word2vec]

This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.

Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are.

This has a very broad range of potential applications: knowledge representation and extraction; machine translation; question answering; conversational systems; and many others.

The original paper by Mikolov et. al. can be found on arxiv.

2274 questions
145
votes
14 answers

How to calculate the sentence similarity using word2vec model of gensim with python

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity('woman', 'man') 0.73723527 However, the word2vec model fails to predict the sentence…
zhfkt
  • 2,415
  • 3
  • 21
  • 24
117
votes
3 answers

word2vec: negative sampling (in layman term)?

I'm reading the paper below and I have some trouble , understanding the concept of negative sampling. http://arxiv.org/pdf/1402.3722v1.pdf Can anyone help , please?
Andy K
  • 4,944
  • 10
  • 53
  • 82
95
votes
9 answers

How to get vector for a sentence from the word2vec of tokens in sentence

I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of the tokens in the sentence.
trialcritic
  • 1,225
  • 1
  • 10
  • 14
68
votes
10 answers

Convert word2vec bin file to text

From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly straightforward to convert the binary format to text…
Glenn
  • 6,455
  • 4
  • 33
  • 42
62
votes
3 answers

What is a projection layer in the context of neural networks?

I am currently trying to understand the architecture behind the word2vec neural net learning algorithm, for representing words as vectors based on their context. After reading Tomas Mikolov paper I came across what he defines as a projection layer.…
Roger
  • 1,053
  • 1
  • 8
  • 14
58
votes
4 answers

Doc2vec: How to get document vectors

How to get document vectors of two text documents using Doc2vec? I am new to this, so it would be helpful if someone could point me in the right direction / help me with some tutorial I am using gensim. doc1=["This is a sentence","This is another…
bee2502
  • 1,145
  • 1
  • 10
  • 13
54
votes
3 answers

CBOW v.s. skip-gram: why invert context and target words?

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it produces, the content of the X and Y pair seems to be…
Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79
54
votes
5 answers

gensim word2vec: Find number of words in vocabulary

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?
hlin117
  • 20,764
  • 31
  • 72
  • 93
53
votes
5 answers

How can a sentence or a document be converted to a vector?

We have models for converting words to vectors (for example the word2vec model). Do similar models exist which convert sentences/documents into vectors, using perhaps the vectors learnt for the individual words?
Sahil
  • 1,346
  • 1
  • 12
  • 17
50
votes
5 answers

How to use word2vec to calculate the similarity distance by giving 2 words?

Word2vec is a open source tool to calculate the words distance provided by Google. It can be used by inputting a word and output the ranked word lists according to the similarity. E.g. Input: france Output: Word Cosine distance …
zhfkt
  • 2,415
  • 3
  • 21
  • 24
49
votes
18 answers

gensim error: ImportError: No module named 'gensim'

I trying to import gensim with import gensim but get the following error ImportError Traceback (most recent call last) in () ----> 1 import gensim 2 model =…
woojung
  • 501
  • 1
  • 4
  • 4
46
votes
4 answers

How to use Gensim doc2vec with pre-trained word vectors?

I recently came across the doc2vec addition to Gensim. How can I use pre-trained word vectors (e.g. found in word2vec original website) with doc2vec? Or is doc2vec getting the word vectors from the same sentences it uses for paragraph-vector…
Stergios
  • 3,126
  • 6
  • 33
  • 55
45
votes
8 answers

How to check if a key exists in a word2vec trained model or not

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view". myModel["view"] However, I get a KeyError for the word…
London guy
  • 27,522
  • 44
  • 121
  • 179
39
votes
4 answers

TensorFlow 'module' object has no attribute 'global_variables_initializer'

I'm new to Tensorflow I'm running a Deep learning Assignment from Udacity on iPython notebook. link And it has an error. AttributeError Traceback (most recent call last) `` in ``() …
Le D. Thang
  • 613
  • 1
  • 7
  • 11
39
votes
6 answers

Update gensim word2vec model

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the model with that sentence so that querying it next…
user2480542
  • 2,845
  • 4
  • 24
  • 25
1
2 3
99 100