0

I'm working with breast cancer dataset with 2 classes 0-1 and the training and accuracy was great, but I have changed the number of classes to 8 classes 0-7 and I'm getting low accuracy wit Ml algorithms but meanwhile the accuracy with ANN 97% maybe I made a mistake but I don't know where

y_pred :

[5 0 3 0 3 6 1 0 2 1 7 6 7 3 0 3 6 3 7 0 7 1 5 2 5 0 3 6 5 5 7 2 0 6 6 6 3
 6 5 0 0 6 6 5 3 0 5 1 6 4 0 7 6 0 5 5 5 0 0 5 7 1 6 6 7 6 0 1 7 5 6 0 6 0
 3 3 6 7 7 1 0 7 0 5 5 0 6 0 0 6 1 6 5 0 0 7 0 1 6 1 0 6 0 7 0 6 0 5 0 6 3
 6 7 0 6 6 0 0 0 5 7 4 6 6 2 3 5 6 0 7 7 0 5 6 0 0 0 6 1 5 0 7 4 6 0 7 3 6
 5 6 6 0 2 0 1 0 7 0 1 7 0 7 7 6 6 6 7 6 6 0 6 5 1 1 7 6 6 7 0 7 0 1 6 0]

y_test:

[1 0 1 6 4 6 1 0 1 3 0 2 6 3 0 1 0 7 0 0 6 6 5 6 2 6 3 6 5 6 7 6 5 7 0 2 3
 6 5 0 7 2 6 4 0 0 2 6 3 7 7 1 3 6 5 0 2 7 0 7 6 0 1 7 6 6 0 4 7 0 0 0 6 0
 3 5 0 0 7 6 0 0 7 0 6 7 7 2 7 1 1 5 5 3 7 4 7 2 2 4 0 0 0 7 0 2 0 6 0 6 1
 7 6 0 6 0 0 1 0 6 6 7 6 6 7 0 6 1 0 0 7 0 5 7 0 0 7 7 6 5 0 0 1 6 0 7 6 6
 5 2 6 0 2 0 6 0 5 0 2 7 0 7 7 6 7 6 6 6 0 6 6 0 1 1 7 6 2 7 6 0 0 6 5 0]

I have replaced multilabel_confusion_matrix with confusion_matrix but still I'm getting the same results the accuracy between 40% to 50%.

and I'm getting results with : cv_results.mean() *100

K-Nearest Neighbours: 39.62 %
Support Vector Machine: 48.09 %
Naive Bayes: 30.46 %
Decision Tree: 30.46 %
Randoom Forest: 52.32 %
Logistic Regression: 44.26 %

here is Ml part :

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
cm = multilabel_confusion_matrix(y_test, y_pred)
models = []
models.append(('K-Nearest Neighbours', KNeighborsClassifier(n_neighbors = 5)))
models.append(('Support Vector Machine', SVC()))
models.append(('Naive Bayes', GaussianNB()))
models.append(('Decision Tree', DecisionTreeClassifier()))
models.append(('Randoom Forest', RandomForestClassifier(n_estimators=100)))
models.append(('Logistic Regression', LogisticRegression()))

results = []
names = []
for name, model in models:
    kfold = model_selection.KFold(n_splits=10, random_state = 8)
    cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')
Mahmoud Odeh
  • 942
  • 1
  • 7
  • 19
  • 1
    How exactly your get this accuracy? With the `cross_val_score` shown? – desertnaut May 03 '20 at 23:30
  • im doing something like this : `cv_results.mean() * 100` and i round it with 2 float digits – Mahmoud Odeh May 03 '20 at 23:37
  • 1
    Can't recall how large exactly is the breast cancer dataset, but I guess it's not that big; with a 10-fold CV you may seriously do not end up with enough training data per fold, and the fact that you have converted it to an 8-class problem probably makes things worse. Try c 3/5-fold CV, and make sure it is stratified (i.e. all 8 classes are present in each fold). – desertnaut May 04 '20 at 00:01

0 Answers0