I'm fairly new to NNs and I'm doing my own "Hello World" with LSTMs instead copying something. I have chosen a simple logic as follows:
Input with 3 timesteps. First one is either 1 or 0, the other 2 are random numbers. Expected output is same as the first timestep of input. The data feed looks like:
_X0=[1,5,9] _Y0=[1] _X1=[0,5,9] _Y1=[0] ... 200 more records like this.
This simple(?) logic can be trained for 100% accuracy. I ran many tests and the most efficient model I found was 3 LSTM layers, each of them with 15 hidden units. This returned 100% accuracy after 22 epochs.
However I noticed something that I struggle to understand: In the first 12 epochs the model makes no progress at all as measured by accuracy (acc. stays 0.5) and only marginal progress measured by Categorical Crossentropy (goes from 0.69 to 0.65). Then from epoch 12 through epoch 22 it trains very fast to accuracy 1.0. The question is: Why does training happens like this? Why the first 12 epochs are making no progress and why epochs 12-22 are so much more efficient?
Here is my entire code:
from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, LSTM
from keras.models import Model
import helper
from keras.utils.np_utils import to_categorical
x_,y_ = helper.rnn_csv_toXY("LSTM_hello.csv",3,"target")
y_binary = to_categorical(y_)
model = Sequential()
model.add(LSTM(15, input_shape=(3,1),return_sequences=True))
model.add(LSTM(15,return_sequences=True))
model.add(LSTM(15, return_sequences=False))
model.add(Dense(2, activation='softmax', kernel_initializer='RandomUniform'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.fit(x_, y_binary, epochs=100)