Questions tagged [word-embedding]

For questions about word embedding, a language modelling technique in natural language processing. Questions can concern particular methods, such as Word2Vec, GloVe, FastText, etc, or word embeddings and their use in machine learning libraries in general.

Word embedding are numeric representations which make it easier to have words with similar meanings to have similar representations. Numerical representations of the words are in a predefined vector space. Word embedding are used to capture the context (semantic or similarity or relation) of the word within a document.

Example frameworks : Word2Vec, GloVe, FastText.

1089 questions

173

votes

9 answers

What does tf.nn.embedding_lookup function do?

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None) I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters corresponding to each id (in ids)? For instance, in the…

asked Jan 19 '16 at 07:14

Poorya Pzm

2,123
3
12
9

votes

4 answers

Embedding in pytorch

Does Embedding make similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?

python pytorch word-embedding

asked Jun 07 '18 at 18:29

user1927468

1,033
2
10
14

votes

2 answers

How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?

I have seen that NLP models such as BERT utilize WordPiece for tokenization. In WordPiece, we split the tokens like playing to play and ##ing. It is mentioned that it covers a wider spectrum of Out-Of-Vocabulary (OOV) words. Can someone please help…

nlp word-embedding

asked Mar 27 '19 at 16:52

Harman

1,168
1
9
19

votes

3 answers

CBOW v.s. skip-gram: why invert context and target words?

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it produces, the content of the X and Y pair seems to be…

nlp tensorflow deep-learning word2vec word-embedding

asked Jul 10 '16 at 01:21

Guillaume Chevalier

9,613
8
51
79

votes

6 answers

PyTorch / Gensim - How do I load pre-trained word embeddings?

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer. How do I get the embedding weights loaded by gensim into the PyTorch embedding layer?

python pytorch neural-network gensim word-embedding

asked Apr 07 '18 at 18:21

MBT

21,733
19
84
102

votes

2 answers

How does mask_zero in Keras Embedding layer work?

I thought mask_zero=True will output 0's when the input value is 0, so the following layers could skip computation or something. How does mask_zero works? Example: data_in = np.array([ [1, 2, 0, 0] ]) data_in.shape >>> (1, 4) # model x =…

python machine-learning keras word-embedding

asked Nov 25 '17 at 11:03

crazytomcat

votes

6 answers

How to cluster similar sentences using BERT

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences. A good example of the implementation can be seen in this short article:…

python nlp artificial-intelligence word-embedding bert-language-model

asked Apr 10 '19 at 18:31

somethingstrang

1,079
2
14
29

votes

2 answers

What's the major difference between glove and word2vec?

What is the difference between word2vec and glove? Are both the ways to train a word embedding? if yes then how can we use both?

machine-learning nlp word2vec word-embedding glove

asked May 10 '19 at 06:10

Hrithik Puri

votes

1 answer

How to get word vectors from Keras Embedding Layer

I'm currently working with a Keras model which has a embedding layer as first layer. In order to visualize the relationships and similarity of words between each other I need a function that returns the mapping of words and vectors of every element…

python dictionary keras keras-layer word-embedding

asked Jul 08 '18 at 18:53

philszalay

votes

3 answers

Gensim 3.8.0 to Gensim 4.0.0

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code: model = KeyedVectors.load_word2vec_format(wv_path, binary= False) words =…

python nlp gensim word2vec word-embedding

asked Mar 30 '21 at 09:28

Md. Ahsanul Kabir Arif

votes

2 answers

What is "unk" in the pretrained GloVe vector files (e.g. glove.6B.50d.txt)?

I found "unk" token in the glove vector file glove.6B.50d.txt downloaded from https://nlp.stanford.edu/projects/glove/. Its value is as follows: unk -0.79149 0.86617 0.11998 0.00092287 0.2776 -0.49185 0.50195 0.00060792 -0.25845 0.17865 0.2535…

neural-network deep-learning nlp word-embedding glove

asked Mar 12 '18 at 16:20

Abhay Gupta

votes

2 answers

Explain with example: how embedding layers in keras works

I don't understand the Embedding layer of Keras. Although there are lots of articles explaining it, I am still confused. For example, the code below isfrom imdb sentiment analysis: top_words = 5000 max_review_length = 500 embedding_vecor_length = 32…

python machine-learning keras neural-network word-embedding

asked Aug 12 '17 at 11:06

user1670773

votes

4 answers

Use LSTM tutorial code to predict next word in a sentence?

I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py (Using tensorflow 1.3.0.) I've summarized…

python tensorflow lstm word2vec word-embedding

asked Sep 08 '17 at 21:55

Darren Cook

27,837
13
117
217

votes

2 answers

What is the preferred ratio between the vocabulary size and embedding dimension?

When using for example gensim, word2vec or a similar method for training your embedding vectors I was wonder what is a good ratio or is there a preferred ratio between the embedding dimension to vocabulary size ? Also how does that change with more…

machine-learning keras nltk word-embedding nltk-trainer

asked Jan 27 '18 at 19:50

Gabriel Bercea

1,191
1
10
21

votes

4 answers

word2vec - what is best? add, concatenate or average word vectors?

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the…

python word2vec gensim word-embedding language-model

asked Oct 23 '17 at 12:44

Lemon

1,394
3
14
24

2 3

…

72 73 Next