Embedding layer in neural machine translation with attention

Question

I am trying to understanding how to implement a seq-to-seq model with attention from this website.

My question: Is nn.embedding just returns some IDs for each word, so the embedding for each word would be the same during whole training? Or are they getting changed during the procedure of training?

My second question is because I am confused whether after training, the output of nn.embedding is something such as word2vec word embeddings or not.

Thanks in advance

score 4 · Accepted Answer · answered Nov 04 '20 at 07:02

According to the PyTorch docs:

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

In short, nn.Embedding embeds a sequence of vocabulary indices into a new embedding space. You can indeed roughly understand this as a word2vec style mechanism.

As a dummy example, let's create an embedding layer that takes as input a total of 10 vocabularies (i.e. the input data only contains a total of 10 unique tokens), and returns embedded word vectors living in 5-dimensional space. In other words, each word is represented as 5-dimensional vectors. The dummy data is a sequence of 3 words with indices 1, 2, and 3, in that order.

>>> embedding = nn.Embedding(10, 5)
>>> embedding(torch.tensor([1, 2, 3]))
tensor([[-0.7077, -1.0708, -0.9729,  0.5726,  1.0309],
        [ 0.2056, -1.3278,  0.6368, -1.9261,  1.0972],
        [ 0.8409, -0.5524, -0.1357,  0.6838,  3.0991]],
       grad_fn=<EmbeddingBackward>)

You can see that each of the three words are now represented as 5-dimensional vectors. We also see that there is a grad_fn function, which means that the weights of this layer will be adjusted through backprop. This answers your question of whether embedding layers are trainable: the answer is yes. And indeed this is the whole point of embedding: we expect the embedding layer to learn meaningful representations, the famous example of king - man = queen being the classic example of what these embedding layers can learn.

Edit

The embedding layer is, as the documentation states, a simple lookup table from a matrix. You can see this by doing

>>> embedding.weight
Parameter containing:
tensor([[-1.1728, -0.1023,  0.2489, -1.6098,  1.0426],
        [-0.7077, -1.0708, -0.9729,  0.5726,  1.0309],
        [ 0.2056, -1.3278,  0.6368, -1.9261,  1.0972],
        [ 0.8409, -0.5524, -0.1357,  0.6838,  3.0991],
        [-0.4569, -1.9014, -0.0758, -0.6069, -1.2985],
        [ 0.4545,  0.3246, -0.7277,  0.7236, -0.8096],
        [ 1.2569,  1.2437, -1.0229, -0.2101, -0.2963],
        [-0.3394, -0.8099,  1.4016, -0.8018,  0.0156],
        [ 0.3253, -0.1863,  0.5746, -0.0672,  0.7865],
        [ 0.0176,  0.7090, -0.7630, -0.6564,  1.5690]], requires_grad=True)

You will see that the first, second, and third rows of this matrix corresponds to the result that was returned in the example above. In other words, for a vocabulary whose index is n, the embedding layer will simply "lookup" the nth row in its weights matrix and return that row vector; hence the lookup table.

As for confirming my question, suppose we have the word "door" in two sentences. First, the "door" gets a vocabulary index (which is fixed during the whole process). Then this word "door", appears in several sentences. Will it get different embedding from this nn.embedding for each sentence, or will it be the same in whatever sentence it is in? Also, how can we retrieve the associated index (word) from one particular embedding? Thanks — Kadaj13, Nov 04 '20 at 08:43
Glad it helped. As for the follow up, the embedding vector for the word "door" will not vary from sentence to sentence, since the index of "door" is constant across all sentences. The embedding vector itself, however, will change throughout the training process. — Jake Tae, Nov 04 '20 at 09:13

Embedding layer in neural machine translation with attention

1 Answers1