Understanding Keras Layer Shape

Question

My features/targets look like:

x[0] = [10, 15, 13]
y[0] = [1, 4]

The numbers represent lookup indexes for words in english and french.

Here's the shape of my input and training data:

input: (137861, 15)
training: (137861, 21)

Here's the summary of my RNN:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_59 (Embedding)     (None, 15, 8)             1592      
_________________________________________________________________
lstm_52 (LSTM)               (None, 15)                1440      
_________________________________________________________________
dropout_39 (Dropout)         (None, 15)                0         
_________________________________________________________________
dense_69 (Dense)             (None, 21)                336       
=================================================================
Total params: 3,368
Trainable params: 3,368
Non-trainable params: 0
_________________________________________________________________

Here is my understanding:

English sentences encoded using the embedding layer, to have a shape (15, 8) meaning 15 words with an 8 number encoding.

LSTM layer goes through this encoded input step by step and transforms it into activations of length 15.

The next dense layer turns these 15 activations into 21 activations which should match to the values of the encodings of the french indexes

My error when running model.fit:

Error when checking target: expected dense_69 to have shape (None, 1) but got array with shape (137861, 21)

Why does it expect dense_69 to have an input of (None, 1) ? Doesn't the summary say the previous layer has an output shape of (None, 15) ?

Why is it getting an array of (137861, 21) ? I have the batch size set to 1024. Shouldn't it be getting an array of length (1024, 15) ?

I also have a question regarding the output shape, can I try to predict the index value directly or do I need to one hot encode the output where the hot encoding maps to a word index.

If so do I then need another post processing step to convert the one hot encoding into the index ? Is it possible to have the model learn the mappings between one hot encoding and the indices?

Here is the code to build my RNN:

learning_rate = .1
model = keras.Sequential()

model.add(layers.Embedding(input_shape=(15,), input_dim=english_vocab_size, output_dim=8))

model.add(layers.LSTM(15, activation='relu'))

model.add(layers.Dropout(.2))

model.add(layers.Dense(output_dim=(21)))

model.summary()

model.compile(loss=sparse_categorical_crossentropy,
              optimizer=SGD(lr=learning_rate),
              metrics=['accuracy'])

simple_rnn_model.fit(preproc_english_sentences, preproc_french_sentances, batch_size=1024, epochs=10, validation_split=0.2)

It is because you are sparse categorical entropy instead of categorical entropy — AloneTogether, Sep 07 '22 at 20:02
Thank you! This does fix the issue. However I still don't understand how these shape vectors are working — Patrick Ward, Sep 14 '22 at 18:03

Understanding Keras Layer Shape

0 Answers0