What is the network structure inside a Tensorflow Embedding Layer?

Question

Tensoflow Embedding Layer (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) is easy to use, and there are massive articles talking about "how to use" Embedding (https://machinelearningmastery.com/what-are-word-embeddings/, https://www.sciencedirect.com/topics/computer-science/embedding-method) . However, I want to know the Implemention of the very "Embedding Layer" in Tensorflow or Pytorch. Is it a word2vec? Is it a Cbow? Is it a special Dense Layer?

coderina · Accepted Answer · 2021-06-09T09:22:40.263

10

Structure wise, both Dense layer and Embedding layer are hidden layers with neurons in it. The difference is in the way they operate on the given inputs and weight matrix.

A Dense layer performs operations on the weight matrix given to it by multiplying inputs to it ,adding biases to it and applying activation function to it. Whereas Embedding layer uses the weight matrix as a look-up dictionary.

The Embedding layer is best understood as a dictionary that maps integer indices (which stand for specific words) to dense vectors. It takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It’s effectively a dictionary lookup.

from keras.layers import Embedding

embedding_layer = Embedding(1000, 64)

Here 1000 means the number of words in the dictionary and 64 means the dimensions of those words. Intuitively, embedding layer just like any other layer will try to find vector (real numbers) of 64 dimensions [ n1, n2, ..., n64] for any word. This vector will represent the semantic meaning of that particular word. It will learn this vector while training using backpropagation just like any other layer.

When you instantiate an Embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.

-- Deep Learning with Python by F. Chollet

Edit - How "Backpropagation" is used to train the look-up matrix of the Embedding Layer ?

Embedding layer is similar to the linear layer without any activation function. Theoretically, Embedding layer also performs matrix multiplication but doesn't add any non-linearity to it by using any kind of activation function. So backpropagation in the Embedding layer is similar to as of any linear layer. But practically, we don't do any matrix multiplication in the embedding layer because the inputs are generally one hot encoded and the matrix multiplication of weights by a one-hot encoded vector is as easy as a look-up.

edited Jun 09 '21 at 09:22

answered Jun 09 '21 at 03:29

coderina

1,583
13
22

However, I want to know the network structure behind "from keras.layers import Embedding". Is it a 1000x 64 units Dense Layer ? – dogdog Jun 09 '21 at 03:37
@dogdog yes, you are somewhat correct. `embedding` layer is 1000 x 64 layer. Don't call it `Dense` layer. Dense layer performs operation like matrix multiplication etc. on weight matrix whereas `Embedding` layer uses the weight matrix as a look up dictionary. So structurally they both are layers with neurons in them , `Dense` layers performs operation on its weight while `Embedding` layer doesn't – coderina Jun 09 '21 at 07:04
2

Thank you! Can it be more specific ? or How can we use a "Backpropagation" algorithm to train that look-up matrix ? – dogdog Jun 09 '21 at 07:31
1

`Embedding` layer is similar to the linear layer without any activation function. Theoretically, `Embedding` layer also performs matrix multiplication but doesn't add any non-linearity to it by using any kind of activation function. So backpropagation in `Embedding` layer is similar to as of any linear layer. But practically, we don't do any matrix multiplication in the embedding layer because the inputs are generally one hot encoded and the matrix multiplication of weights by a one-hot encoded vector is as easy as a look-up. – coderina Jun 09 '21 at 09:17
I am sorry I actually forgot to mention the role of activation function in the Dense layer in my answer so I edited it. – coderina Jun 09 '21 at 09:18
1

I see！ Thank you again! – dogdog Jun 09 '21 at 09:41
I'm confused by your claim that the input to an embedding layer is generally one-hot encoded, @coderina. [This answer](https://stats.stackexchange.com/a/305032/249133) suggests that the input is actually an index value? – starbeamrainbowlabs Jul 02 '21 at 15:58
@starbeamrainbowlabs please read the third last paragraph of that answer... – coderina Jul 02 '21 at 18:51
@coderina If you mean the paragraph that starts `Here 1000 means the number.`...., then no, that's not what I'm referring to. I'm talking about the *input* to the model, not the output. – starbeamrainbowlabs Jul 02 '21 at 22:20
@starbeamrainbowlabs No I referred the last third paragraph of the answer that you referred. It starts from `For an intuition of how this table lookup is implemented...`. – coderina Jul 03 '21 at 08:50

What is the network structure inside a Tensorflow Embedding Layer?

1 Answers1