0

I have made an encoder decoder model using Keras framework, for making a chatbot. I cannot find any issues with my model, still on training the LOSS is nan from the first epoch itself, and the accuracy remains zero.

I have tried the code for different batch sizes, different learning rates, different optimizers, but there is not even a slight change in the output values. I even tried gradient clipping and regularization still no signs of even a bit of improvement. The output that the model gives is completely random.

The code takes up inputs of shape:

(BATCH, MAX_LENGTH) for encoder input -> Converted to (BATCH, MAX_LENGTH, EMB_SIZE) by embedding layer

(BATCH, MAX_LENGTH) for decoder input -> Converted to (BATCH, MAX_LENGTH, EMB_SIZE) by embedding layer

Output shape is:

(BATCH, MAX_LENGTH, 1) for decoder target (hence the loss that I use is 'sparse_categorical_crossentropy')

Here is the code of my model:

# Define an input sequence and process it.
encoder_inputs = Input(name='encoder_input', shape=(None,))
encoder_embedding = Embedding(name='encoder_emb', input_dim=VOCAB_SIZE,
                              output_dim=EMB_SIZE,
                              weights=[embedding_matrix],
                              trainable=False,
                              input_length=MAX_LENGTH)(encoder_inputs)
encoder = LSTM(HIDDEN_DIM, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(name='decoder_input', shape=(None, ))
decoder_embedding = Embedding(name='decoder_emb', input_dim=VOCAB_SIZE,
                              output_dim=EMB_SIZE,
                              weights=[embedding_matrix],
                              trainable=False,
                              input_length=MAX_LENGTH)(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the 
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(HIDDEN_DIM, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding,
                                     initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(VOCAB_SIZE, activation='softmax'))

decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

The word embeddings (embedding_matrix) is developed using GloVe embeddings.

This is how the results come up for the training...

Epoch 1/100 1329/1329 [==============================] - 1s 868us/step - loss: nan - accuracy: 4.7655e-04

Epoch 2/100 1329/1329 [==============================] - 0s 353us/step - loss: nan - accuracy: 4.7655e-04

Epoch 3/100 1329/1329 [==============================] - 0s 345us/step - loss: nan - accuracy: 4.7655e-04

Epoch 4/100 1329/1329 [==============================] - 0s 354us/step - loss: nan - accuracy: 4.7655e-04

Epoch 5/100 1329/1329 [==============================] - 0s 349us/step - loss: nan - accuracy: 4.7655e-04

halfer
  • 19,824
  • 17
  • 99
  • 186

1 Answers1

0

The issue was in my data. The model is perfect!