I would like to understand what could be responsible for this model loss behaviour. Training a CNN network, with 6 hidden-layers, the loss shoots up from around 1.8 to above 12 after the first epoch and remains constant for the remaining 99 epochs.
724504/724504 [==============================] - 358s 494us/step - loss: 1.8143 - acc: 0.7557 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 2/100
724504/724504 [==============================] - 355s 490us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 3/100
724504/724504 [==============================] - 354s 489us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 4/100
724504/724504 [==============================] - 348s 481us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 5/100
724504/724504 [==============================] - 355s 490us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
I cannot believe this got to do with the dataset I work with, because I tried this with a different, publicly available dataset, the performance is exactly the same (in fact exact figures for loss/accuracy).
I also tested this with a somehow show network having 2 hidden-layers, see the performance below:
724504/724504 [==============================] - 41s 56us/step - loss: 0.4974 - acc: 0.8236 - val_loss: 15.5007 - val_acc: 0.0330
Epoch 2/100
724504/724504 [==============================] - 40s 56us/step - loss: 0.5204 - acc: 0.8408 - val_loss: 15.5543 - val_acc: 0.0330
Epoch 3/100
724504/724504 [==============================] - 41s 56us/step - loss: 0.6646 - acc: 0.8439 - val_loss: 15.3904 - val_acc: 0.0330
Epoch 4/100
724504/724504 [==============================] - 41s 57us/step - loss: 8.8982 - acc: 0.4342 - val_loss: 15.5867 - val_acc: 0.0330
Epoch 5/100
724504/724504 [==============================] - 41s 57us/step - loss: 0.5627 - acc: 0.8444 - val_loss: 15.5449 - val_acc: 0.0330
Can someone points the probable cause of this behaviour? What parameter / configuration needs be adjusted?
EDIT
Model creation
model = Sequential()
activ = 'relu'
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ, input_shape=(1, n_points, 4)))
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
#model.add(Dropout(.5))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
#model.add(Dropout(.5))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Dropout(.5))
model.add(Flatten())
A = model.output_shape
model.add(Dense(int(A[1] * 1/4.), activation=activ))
model.add(Dropout(.5))
model.add(Dense(NoClass, activation='softmax'))
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_reample, Y_resample, epochs=100, batch_size=64, shuffle=False,
validation_data=(Test_X, Test_Y))
Changing the learning rate to lr=0.0001
here's the result after 100 epochs
.
72090/72090 [==============================] - 29s 397us/step - loss: 0.5040 - acc: 0.8347 - val_loss: 4.3529 - val_acc: 0.2072
Epoch 99/100
72090/72090 [==============================] - 28s 395us/step - loss: 0.4958 - acc: 0.8382 - val_loss: 6.3422 - val_acc: 0.1806
Epoch 100/100
72090/72090 [==============================] - 28s 393us/step - loss: 0.5084 - acc: 0.8342 - val_loss: 4.3781 - val_acc: 0.1925
the optimal epoch size: 97, the value of high accuracy 0.20716827656581954
EDIT 2
Apparently, SMOTE isn't good for sampling all but majority class in a multiclassification, see below the trian/test plot: