How to implement low-dimensional embedding layer in pytorch

Question

I recently read a paper about embedding.

In Eq. (3), the f is a 4096X1 vector. the author try to compress the vector in to theta (a 20X1 vector) by using an embedding matrix E.

The equation is simple theta = E*f

I was wondering if it can using pytorch to achieve this goal, then in the training, the E can be learned automatically.

How to finish the rest? thanks so much.

The demo code is follow:

import torch
from torch import nn

f = torch.randn(4096,1)

Umang Gupta · Accepted Answer · 2019-03-30T19:12:56.373

Assuming your input vectors are one-hot that is where "embedding layers" are used, you can directly use embedding layer from torch which does above as well as some more things. nn.Embeddings take non-zero index of one-hot vector as input as a long tensor. For ex: if feature vector is

f = [[0,0,1], [1,0,0]]

then input to nn.Embeddings will be

input = [2, 0]

However, what OP has asked in question is getting embeddings by matrix multiplication and below I will address that. You can define a module to do that as below. Since, param is an instance of nn.Parameter it will be registered as a parameter and will be optimized when you call Adam or any other optimizer.

class Embedding(nn.Module):
    def __init__(self, input_dim, embedding_dim):
        super().__init__()
        self.param = torch.nn.Parameter(torch.randn(input_dim, embedding_dim))

    def forward(self, x):
        return torch.mm(x, self.param)

If you notice carefully this is the same as a linear layer with no bias and slightly different initialization. Therefore, you can achieve the same by using a linear layer as below.

self.embedding = nn.Linear(4096, 20, bias=False)
# change initial weights to normal[0,1] or whatever is required
embedding.weight.data = torch.randn_like(embedding.weight)

Thanks, Does your code work for any feature vectors? it seems that the `f` is a `4096X1` vector and it can contain any number for the vector elements. — jason, Mar 30 '19 at 17:51
This code can work for any real-valued feature vector, since it is just a matrix multiplication. Though `nn.Embedding` expects one-hot features and expects you pass non-zeros indices as long tensor if I remember correctly. I am not sure if you are confused about that, if that is the case, I can elaborate more in my answer — Umang Gupta, Mar 30 '19 at 18:36
Will the self.param update automatically if use Adam gradient? thanks — jason, Mar 30 '19 at 18:42
Hi @jason, I have updated my answer to reflect most of your queries — Umang Gupta, Mar 30 '19 at 19:13

How to implement low-dimensional embedding layer in pytorch

1 Answers1