-1

Decision Tree

I have found Misclassification rates for all the leaf nodes.

  1. samples = 3635 + 1101 = 4736, class = Cash, misclassification rate = 1101 / 4736 = 0.232.

  2. samples = 47436 + 44556 = 91992, class = Cash, misclassification rate = 44556 / 91992 = 0.484.

  3. samples = 7072 + 15252 = 22324, class = Credit Card, misclassification rate = 7072 / 22324 = 0.317.

  4. samples = 1294 + 1456 = 2750, class = Credit Card, misclassification rate = 1294 / 2750 = 0.470.

  5. samples = 7238 + 22295 = 29533, class = Credit Card, misclassification rate = 7238 / 29533 = 0.245.

I'm finding it difficult to find AUC value from here. Please help me out with this. I will be grateful.

  • do you want to calculate AUC score for the training dataset? – nithish08 Apr 09 '23 at 18:53
  • @nithish08, Yes based on the decision tree I have attached. I have also calculated RMSE for the predicted event probability is the Prob (class = credit). RMSE value is 0.4974 – Aman Rangapur Apr 09 '23 at 19:52

1 Answers1

0
from sklearn.metrics import roc_auc_score

def create_actual_prediction_arrays(n_pos, n_neg):
    prob = n_pos / (n_pos + n_neg)
    y_true = [1] * n_pos + [0] * n_neg
    y_score = [prob] * (n_pos + n_neg)
    
    return y_true, y_score

total_y_true = []
total_y_score = []
for n_pos, n_neg in [(3635, 1101), (47436, 44556), (7072, 15252), (1294, 1456), (7238, 22295)]:
    y_true, y_score = create_actual_prediction_arrays(n_pos, n_neg)
    total_y_true = total_y_true + y_true
    total_y_score = total_y_pred + y_score
    
print("auc_score = ", roc_auc_score(y_true=total_y_true, y_score=total_y_pred))
    

Explanation - This gathers all the true y values and predicted y_scores across all nodes and calculates the AUC score.

nithish08
  • 468
  • 2
  • 7