I am coding a text generation RNN with keras. I am training it on the scripts of the play "Macbeth". The accuracy kept increasing very slowly, and soon stopped increasing altogether. I assumed this was due to the vanishing gradient problem, so I added a few batch norm layers and decreased the learning rate. After tweaking with the batch size, epochs, and number of neurons in the network I got comparatively better results. However, right when my accuracy is about to reach 10%, it goes down to 0.001 somehow. And whenever a new epoch begins, the accuracy decreases to 0.001 again. I am unable to train my RNN due to this.
My main questions are:
- What caused this sudden drop in accuracy?
- What would be the optimum value for the batch size, number of epochs, learning rate, and other hyperparameters in this situation? Please explain why.
This isn't my whole code, only the model definition, compiling, and training:
model = Sequential([
BatchNormalization(axis=1),
LSTM(64, return_sequences=True, input_shape=(data.shape[1], data.shape[2]), activation='tanh'),
BatchNormalization(axis=1),
LSTM(64, return_sequences=True, activation='tanh'),
LSTM(64, activation='tanh'),
BatchNormalization(axis=1),
Dense(labels.shape[1], activation='softmax')
])
model.compile(
optimizer=Adam(learning_rate=0.00005),
loss=('categorical_crossentropy'),
metrics=['accuracy']
)
model.fit(data, labels, batch_size=25, epochs=32)