My model uses LGBMClassifier
. I'd like to use Shap (Shapley) to interpret features. However, Shap gave me errors on categorical features. For example, I have a feature "Smoker" and its values include "Yes" and "No". I got an error from Shap:
ValueError: could not convert string to float: 'Yes'.
Am I missing any settings?
BTW, I know that I could use one-hot encoding to convert categorical features but I don't want to, since LGBMClassifier
can handle categorical features without one-hot encoding.
Here's the sample code: (shap version is 0.40.0, lightgbm version is 3.3.2)
import pandas as pd
from lightgbm import LGBMClassifier #My version is 3.3.2
import shap #My version is 0.40.0
#The training data
X_train = pd.DataFrame()
X_train["Age"] = [50, 20, 60, 30]
X_train["Smoker"] = ["Yes", "No", "No", "Yes"]
#Target: whether the person had a certain disease
y_train = [1, 0, 0, 0]
#I did convert categorical features to the Category data type.
X_train["Smoker"] = X_train["Smoker"].astype("category")
#The test data
X_test = pd.DataFrame()
X_test["Age"] = [50]
X_test["Smoker"] = ["Yes"]
X_test["Smoker"] = X_test["Smoker"].astype("category")
#the classifier
clf = LGBMClassifier()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
#shap
explainer = shap.TreeExplainer(clf)
#I see this setting from google search but it did not really help
explainer.model.original_model.params = {"categorical_feature":["Smoker"]}
shap_values = explainer(X_train) #the error came out here: ValueError: could not convert string to float: 'Yes'