I did an ANN text classification with labels 0=negative and 1=positive, the amount of my positive data was almost 3 times more than the negative data. I did an experiment using random oversampling with smote and without random oversampling. I don't understand why I get completely different plot loss and accuracy results, what does that mean?
ANN without random sampling:
[ANN plot without random sampling1
- Best Epoch Epoch 6/1000 72/72 [==============================] - 0s 6ms/step - loss: 0.1337 - accuracy: 0.9577 - val_loss: 0.2202 - val_accuracy: 0.9007
ANN with random oversampling SMOTE:
[ANN with random oversampling SMOTE2
- Bestt epoch Epoch 29/1000 106/106 [==============================] - 0s 3ms/step - loss: 0.0798 - accuracy: 0.9659 - val_loss: 0.0767 - val_accuracy: 0.9941
This is my code:
#With SMOTE
sm = SMOTE(random_state=42)
Train_X2_Smote, Train_Y2_Smote = sm.fit_resample(Train_X2_Tfidf, Train_Y2)
def reset_seeds():
np.random.seed(0)
python_random.seed(0)
tf.random.set_seed(0)
reset_seeds()
model2 = Sequential()
model2.add(Dense(20, input_dim= Train_X2_Smote.shape[1], activation='sigmoid'))
model2.add(Dense(1, activation='sigmoid'))
opt = Adam (learning_rate=0.01)
model2.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model2.summary()
es = EarlyStopping(monitor="val_loss",mode='min',patience=10)
history2 = model2.fit(Train_X2_Smote, Train_Y2_Smote, epochs=1000, verbose=1,
validation_split=0.2, batch_size=32, callbacks =[es])