I'm attempting to gather ID level drivers from my XGBoost classification model using LIME and I'm running into some odd errors. I'm using this link as a reference.
Here is the overall code that I'm using:
explainer = lime.lime_tabular.LimeTabularExplainer(Xs_train.values, class_names = [1.0, 0.0], kernel_width = 3)
predict_fn_xgb = lambda x: trained_model.predict_proba(x).astype(float)
data_point = Xs_val.values[5]
exp = explainer.explain_instance(data_point, predict_fn_xgb, num_features = 10)
exp.show_in_notebook(show_all = False)
Key:
- trained_model: trained xgboost classification model
- class names: This is a binary classification model
- Xs_train: This is a (73548, 84) dimension training set. This was used to build the training_model
- Xs_val: This is a (4910, 84) dimension training set. The columns are the same with the training and validation set.
- data_point: one specific validation point
Now, when I run this code, I get the following error:
ValueError: expected res_time, email_views...training data did not have the following fields: f6, f49, f34, f21,...
I don't know where the f#
column names are coming from. Seems really bizarre and I believe I'm following the example correctly.
Any help would be much appreciated. Let me know if any additional information is required.