3

My task is to learn defected items in a factory. It means, I try to detect defected goods or fine goods. This led a problem where one class dominates the others (one class is 99.7% of the data) as the defected items were very rare. Training accuracy is 0.9971 and validation accuracy is 0.9970. It sounds amazing. But the problem is, the model only predicts everything is 0 class which is fine goods. That means, it fails to classify any defected goods. How can I solve this problem? I have checked other questions and tried out, but I still have the situation. the total data points are 122400 rows and 5 x features.

In the end, my confusion matrix of the test set is like this

array([[30508,     0],
       [   92,     0]], dtype=int64)

which does a terrible job.

My code is as below:

le = LabelEncoder()
y = le.fit_transform(y)



ohe = OneHotEncoder(sparse=False)
y = y.reshape(-1,1)
y = ohe.fit_transform(y)


scaler = StandardScaler()
x = scaler.fit_transform(x)


x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.25, random_state = 777) 




#DNN Modelling


epochs = 15
batch_size =128
Learning_rate_optimizer = 0.001



model = Sequential() 

model.add(Dense(5, 
                kernel_initializer='glorot_uniform',
                activation='relu', 
                input_shape=(5,)))  

model.add(Dense(5,
                kernel_initializer='glorot_uniform', 
                activation='relu'))   
model.add(Dense(8,
                kernel_initializer='glorot_uniform', 
                activation='relu'))

model.add(Dense(2,
                kernel_initializer='glorot_uniform', 
                activation='softmax')) 



model.compile(loss='binary_crossentropy',
              optimizer=Adam(lr = Learning_rate_optimizer), 
              metrics=['accuracy']) 


history = model.fit(x_train, y_train,
                    batch_size=batch_size, 
                    epochs=epochs,  
                    verbose=1, 
                    validation_data=(x_test, y_test))



y_pred = model.predict(x_test)

confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1)) 

Thank you

junmouse
  • 155
  • 1
  • 9

2 Answers2

4

it sounds like you have highly imbalanced dataset, the model is learning only how to classify fine goods. you can try one of the approaches listed here: https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

Silver
  • 125
  • 6
2

The best attempt would be to firstly take almost equal portions of data of both classes, split them into train-test-val, train the classifier and do thorough testing on your complete dataset. You can also try and use data augmentation techniques to your other set to get more data from the same set. Keep on iterating and maybe even try and change your loss function to suit your condition.

Aman pradhan
  • 258
  • 5
  • 12