0

I have trained my imbalanced dataset (binary classification) using CatboostClassifer. Now, I am trying to interpret the model using the SHAP library. Below is the code to fit the model and calculate shap values:

weights = y.value_counts()[0] / y.value_counts()[1]
catboost_clf = CatBoostClassifier(loss_function='Logloss', iterations=100, verbose=True, \
                              l2_leaf_reg=6, scale_pos_weight=weights,eval_metric="MCC")
catboost_clf.fit(X, y)

trainx_preds = catboost_clf.predict(X_test)

explainer = shap.TreeExplainer(catboost_clf)
shap_values = explainer.shap_values(Pool(X,y))

#Class 0 samples   1625125
#Class 1 samples   122235

The size of shap values is (1747360, 13) i.e. (number of instances, number of features). I was expecting the shap values to be a 3d array i.e. (number of classes,number of instances, number of features). Shap values for each of the positive and negative class. How do I achieve that? How do I extract class wise shapley values to better understanding of the model.

Also, explainer.expected_value shows one base value instead of two.

Is there anything missing or incorrect in the code?

Thanks in advance!

Dhvani Shah
  • 351
  • 1
  • 7
  • 17

1 Answers1

0

Adding 'Multicalss' to the loss_function solved the problem. Referred to the documentation: Catboost

model = CatBoostClassifier(loss_function = 'MultiClass')
Dhvani Shah
  • 351
  • 1
  • 7
  • 17