0

More of a theoretical question than anything. If I have a near zero cross entropy loss in a binary classification where the last layer is a softmax and the input layer is an LSTM, does it make sense that the accuracy tops out at 54% on the train set? I would have assumed that it would overfit the data and if I had such a low loss then I would have an extremely overfit function with high accuracy.

I also have tried different learning rates, 0.01, 0.001, 0.0001 all with the exact same result. I have also added a second LSTM layer under the first LSTM to increase model complexity to overfit the model on purpose but that didn't do anything either.

What theoretical concept am I missing?

model.add(LSTM(64, input_shape=(100000,26), return_sequences = True, activation = 'relu'))
model.add(Dropout(0.3))
model.add(LSTM(32, return_sequences = True, activation = 'relu'))
model.add(Dense(2))
model.add(Activation('softmax'))
opt1 = keras.optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=opt1, metrics=['accuracy'])

filepath="model_check-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, 
                             save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model.fit(df_matrix2, y_matrix2, epochs=5, batch_size=2, verbose=2)

And setting up the matrices in the beginning. There is some overlap as I was testing other things earlier.

df_matrix = df_model.as_matrix()
df_matrix = np.reshape(df_matrix,(-1,588425,26))
y_matrix = y.as_matrix()
y_matrix = np.reshape(y_matrix,(-1,588425,1))
df_matrix2 = df_matrix[:,0:100000,:]
df_matrix2 = np.reshape(df_matrix2, (-1,100000,26))
y_matrix2 = y_matrix[:,0:100000,:]
y_matrix2 = np.reshape(y_matrix2,(-1,100000,1))
y_matrix2 = keras.utils.np_utils.to_categorical(y_matrix2,2)
y_matrix2 = np.reshape(y_matrix2,(-1,100000,2))

This is stock data, but I created a classifier so it's just a 0 or 1 based on whether or not it is higher or lower in 60 minutes. So there is a lot of randomness to it to begin with. I just assumed that the LSTM would overfit and I'd get a high accuracy.

Epoch 1/5
194s - loss: 0.0571 - acc: 0.5436
Epoch 2/5
193s - loss: 1.1921e-07 - acc: 0.5440
Epoch 3/5
192s - loss: 1.1921e-07 - acc: 0.5440
Epoch 4/5

Those are my losses and accuracy.

a1letterword
  • 307
  • 1
  • 4
  • 16
  • Actually looked at my data and it clearly is being trained to just predict 0 everytime since the frequency of 0's is 0.54403 – a1letterword Sep 06 '17 at 17:59
  • I think I figured out the problem. I wasn't training it for nearly enough epochs. When I shrank the input size to 1000 timesteps instead and ran it for 2000 epochs I got accuracy scores over 90%. – a1letterword Sep 06 '17 at 21:36
  • can't that be a case of overfitting? As I understand if u run for so many epochs ur model will basically memorize the inputs rather than understand it. Try validating the model with some other dataset. – Ilaya Raja S Jun 05 '18 at 05:33

1 Answers1

0

Note that 50% means no accuracy as it's as good as coin toss. Generally speaking when you have 54% accuracy it means that it's totally random. It's very hard to see "coin toss" randomness with neural network, that's where the 4% come from. At least in Keras with Tensorflow backend for binary prediction it goes like this.

Getting this result is a clear indicator that your model is not working. One of the common reasons for this is your features (either dependent variables or outcome variable). It will be easier for people to help you if you post your model, the code you use to transform your data and a head of your data.

mikkokotila
  • 1,403
  • 12
  • 16