0

I did an ANN text classification with labels 0=negative and 1=positive, the amount of my positive data was almost 3 times more than the negative data. I did an experiment using random oversampling with smote and without random oversampling. I don't understand why I get completely different plot loss and accuracy results, what does that mean? ANN without random sampling:
[ANN plot without random sampling1

  • Best Epoch Epoch 6/1000 72/72 [==============================] - 0s 6ms/step - loss: 0.1337 - accuracy: 0.9577 - val_loss: 0.2202 - val_accuracy: 0.9007

ANN with random oversampling SMOTE:
[ANN with random oversampling SMOTE2

  • Bestt epoch Epoch 29/1000 106/106 [==============================] - 0s 3ms/step - loss: 0.0798 - accuracy: 0.9659 - val_loss: 0.0767 - val_accuracy: 0.9941

This is my code:

#With SMOTE
sm = SMOTE(random_state=42)
Train_X2_Smote, Train_Y2_Smote = sm.fit_resample(Train_X2_Tfidf, Train_Y2)

def reset_seeds():
   np.random.seed(0) 
   python_random.seed(0)
   tf.random.set_seed(0)

reset_seeds() 

model2 = Sequential()
model2.add(Dense(20, input_dim= Train_X2_Smote.shape[1], activation='sigmoid'))
model2.add(Dense(1, activation='sigmoid'))
opt = Adam (learning_rate=0.01)
model2.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model2.summary()

es = EarlyStopping(monitor="val_loss",mode='min',patience=10)
history2 = model2.fit(Train_X2_Smote, Train_Y2_Smote, epochs=1000, verbose=1, 
                      validation_split=0.2, batch_size=32, callbacks =[es])
Andryan
  • 11
  • 2

0 Answers0