interpreting get_weight in LSTM model in keras

Question

This is my simple reproducible code:

from keras.callbacks import ModelCheckpoint
from keras.models import Model
from keras.models import load_model
import keras
import numpy as np

SEQUENCE_LEN = 45
LATENT_SIZE = 20
VOCAB_SIZE = 100

inputs = keras.layers.Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(VOCAB_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()

x = np.random.randint(0, 90, size=(10, SEQUENCE_LEN,VOCAB_SIZE))
y = np.random.normal(size=(10, SEQUENCE_LEN, VOCAB_SIZE))
NUM_EPOCHS = 1
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS,callbacks=[checkpoint])

and here is my code to have a look at the weights in the encoder layer:

for epoch in range(1, NUM_EPOCHS + 1):
    file_name = "checkpoint/" + str(epoch) + ".hdf5"
    lstm_autoencoder = load_model(file_name)
    encoder = Model(lstm_autoencoder.input, lstm_autoencoder.get_layer('encoder_lstm').output)
    print(encoder.output_shape[1])
    weights = encoder.get_weights()[0]
    print(weights.shape)
    for idx in range(encoder.output_shape[1]):
        token_idx = np.argsort(weights[:, idx])[::-1]

here print(encoder.output_shape) is (None,20) and print(weights.shape) is (100, 80).

I understand that get_weight will print the weight transition after the layer.

The part I did not get based on this architecture is 80. what is it?

And, are the weights here the weight that connect the encoder layer to the decoder? I meant the connection between encoder and the decoder.

I had a look at this question here. as it is only simple dense layers I could not connect the concept to the seq2seq model.

Update1

What is the difference between: encoder.get_weights()[0] and encoder.get_weights()[1]? the first one is (100,80) and the second one is (20,80) like conceptually?

any help is appreciated:)

score 1 · Accepted Answer · answered Jul 12 '19 at 19:37

1

The encoder as you have defined it is a model, and it consists of two layers: an input layer and the 'encoder_lstm' layer which is the bidirectional LSTM layer in the autoencoder. So its output shape would be the output shape of 'encoder_lstm' layer which is (None, 20) (because you have set LATENT_SIZE = 20 and merge_mode="sum"). So the output shape is correct and clear.

However, since encoder is a model, when you run encoder.get_weights() it would return the weights of all the layers in the model as a list. The bidirectional LSTM consists of two separate LSTM layers. Each of those LSTM layers has 3 weights: the kernel, the recurrent kernel and the biases. So encoder.get_weights() would return a list of 6 arrays, 3 for each of the LSTM layers. The first element of this list, as you have stored in weights and is subject of your question, is the kernel of one of the LSTM layers. The kernel of an LSTM layer has a shape of (input_dim, 4 * lstm_units). The input dimension of 'encoder_lstm' layer is VOCAB_SIZE and its number of units is LATENT_SIZE. Therefore, we have (VOCAB_SIZE, 4 * LATENT_SIZE) = (100, 80) as the shape of kernel.

answered Jul 12 '19 at 19:37

today

32,602
8
95
115

Thank you so much for your detail explanations.I got one more question regarding that, hope you can help me clarify it. so yu mention that we have 3 weights. I need to see the weight associated with those `20 neurons` of the `latent size`. with your explanation, I think I'm not doing it correctly and I should look at `encoder.get_weights()[1]`. Am I right? – sariii Jul 12 '19 at 19:52
1

@sariii I am not sure what you mean by "weight associated with those 20 neurons"?! All of these weights are associated with the neurons. Actually, the kernel consists of four sub-kernels of shape `(input_dim, lstm_units)` and each has a purpose. The same thins applies to recurrent kernel: four sub-kernels of shape `(lstm_units, lstm_units)` which makes it to have a shape of `(lstm_units, 4* lstm_units)`. So I am not sure which one of these you are interested in. Each have a separate purpose, but all are related to the neurons in the LSTM layer. – today Jul 12 '19 at 20:02
Thank you so much really appreciate it, now its more clear to me. I have updated the question with one more question exactly related to your explanations in the comment. would you please have a look and update your answer? also, I appreciate if you could provide me with a link I can read to understand their differences. Now the only vague part is what is really their differences. – sariii Jul 12 '19 at 20:06
Let me share my confusion part about lstm, I knew that we have 4 kernels in lstm cell, however I used to think that as we are extracting output from the last layer we no longer are interested in the weights in the first thrid layer, so the weights are the weight connected with the last output! – sariii Jul 12 '19 at 20:14
1

@sariii Re-reading your comment, I think you are interested in the kernel weights (i.e. the ones returned by `.get_weights()[0]`), because actually those are the weights which are directly associated with the LSTM neurons. The recurrent kernel deals with the hidden state. [This image](https://adamtiger.github.io/NNSharp/recurrents/#lstm) probably helps you: the `W`s are kernels and the `U`s are the recurrent kernels. Also see the [relevant part](https://github.com/keras-team/keras/blob/ed07472bc5fc985982db355135d37059a1f887a9/keras/layers/recurrent.py#L1913-L1935) in Keras source code. – today Jul 12 '19 at 20:16
Thank you so much, now I can see better:). Though the last question(in advance sorry for many questions I asked). I literally have 20 neurons in the encoder layer, which I meant that I wanna have 20 vector, or 20 cluster ,... . how can I interpret the 80 here in temrs of importance?. if you want me to ask a new question I can do that:) – sariii Jul 12 '19 at 20:25
@sariii Probably it's better to ask a new question because I am not sure if I understand your comment (please ask it on [CrossValidated](https://stats.stackexchange.com/) or [DataScience](https://datascience.stackexchange.com/), not StackOverflow because it is only for programming questions). You can view those neurons (without considering the recurrence and hidden state) as neurons in a Dense layer: given an input, they multiply it by a weight matrix, sum the result by a bias and apply an activation function and return the result. But I am not sure for what and how you want to use them. – today Jul 12 '19 at 20:43
Sounds good, let me ask new question in crossvalidated and I provide more details about for what I need it and link it here. I hope you can have a look at it, thank you again – sariii Jul 12 '19 at 20:45
@sariii Also (if you haven't read it already) [this great article](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) may help you to better understand LSTMs, as it has helped many. – today Jul 12 '19 at 20:46
Many thanks for the link, I actually read it before, but honestly I thought when we call encoder.get_weigh we only access to the weigh associate with Ot, So I just imagined `20 neuron` with vocab size `(vocab_size,latent_size)`, now the idea of `4* latent_size` proved that I was totally wrong – sariii Jul 12 '19 at 20:51
Again thank you for your help, I got the answer to question I was looking for here https://stackoverflow.com/questions/42861460/how-to-interpret-weights-in-a-lstm-layer-in-keras . plus the first link you suggested helped very much.Thanks:) – sariii Jul 13 '19 at 18:55
1

@sariii Ah, so you just wanted to find the sub-kernels?! I had already shared with you the link to Keras source code in the comments and I thought you have looked at it to find which slice corresponds to which sub-kernel! Anyways, I am glad you could find the answers to your questions at last :) – today Jul 13 '19 at 19:39

interpreting get_weight in LSTM model in keras

1 Answers1