I am trying to do classification with machine learning. I have "good" and "bad" classes in my dataset.
Dataset shape: (248857, 12)
Due to some conditions, I am not able to collect more "good" class results, there are around 40k good, and 210k bad results. Is that an issue more with the models?
I trained the model in this way: (as an example I used here Naive Bayes but I use KNN, SVM, MLP, Random Forest, and Decision Tree as well)
X = df.drop(['Label'], axis=1)
y = df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)
classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_predNaive = classifier.predict(X_test)
print(f'Test score {accuracy_score(y_predNaive,y_test)}')
plot_confusionmatrix(y_predNaive,y_test,dom='Test')
print('Classification Report for Naive Bayes\n\n', classification_report(y_test, y_predNaive))