More of a theoretical question than anything. If I have a near zero cross entropy loss in a binary classification where the last layer is a softmax and the input layer is an LSTM, does it make sense that the accuracy tops out at 54% on the train set? I would have assumed that it would overfit the data and if I had such a low loss then I would have an extremely overfit function with high accuracy.
I also have tried different learning rates, 0.01, 0.001, 0.0001 all with the exact same result. I have also added a second LSTM layer under the first LSTM to increase model complexity to overfit the model on purpose but that didn't do anything either.
What theoretical concept am I missing?
model.add(LSTM(64, input_shape=(100000,26), return_sequences = True, activation = 'relu'))
model.add(Dropout(0.3))
model.add(LSTM(32, return_sequences = True, activation = 'relu'))
model.add(Dense(2))
model.add(Activation('softmax'))
opt1 = keras.optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=opt1, metrics=['accuracy'])
filepath="model_check-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1,
save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(df_matrix2, y_matrix2, epochs=5, batch_size=2, verbose=2)
And setting up the matrices in the beginning. There is some overlap as I was testing other things earlier.
df_matrix = df_model.as_matrix()
df_matrix = np.reshape(df_matrix,(-1,588425,26))
y_matrix = y.as_matrix()
y_matrix = np.reshape(y_matrix,(-1,588425,1))
df_matrix2 = df_matrix[:,0:100000,:]
df_matrix2 = np.reshape(df_matrix2, (-1,100000,26))
y_matrix2 = y_matrix[:,0:100000,:]
y_matrix2 = np.reshape(y_matrix2,(-1,100000,1))
y_matrix2 = keras.utils.np_utils.to_categorical(y_matrix2,2)
y_matrix2 = np.reshape(y_matrix2,(-1,100000,2))
This is stock data, but I created a classifier so it's just a 0 or 1 based on whether or not it is higher or lower in 60 minutes. So there is a lot of randomness to it to begin with. I just assumed that the LSTM would overfit and I'd get a high accuracy.
Epoch 1/5
194s - loss: 0.0571 - acc: 0.5436
Epoch 2/5
193s - loss: 1.1921e-07 - acc: 0.5440
Epoch 3/5
192s - loss: 1.1921e-07 - acc: 0.5440
Epoch 4/5
Those are my losses and accuracy.