Python XGBoost Regressor Error: Feature_names mismatch

Question

I'm trying to use XGBoost Regressor to predict revenue, given some input features. However, I get a feature_names mismatch error when I run it. The features are all numerical features and there are no missing values.

cols_to_use = ['Product Visitors', 'Product Pageviews', 'Rating']
X = df[cols_to_use]
y = df['Revenue']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

xgboostmodel = XGBRegressor(n_estimators=1000, max_depth=7)
xgboostmodel.fit(X_train, y_train)
y_pred = xgboostmodel.predict(X_test)

xgboostmodel.score(y_test, y_pred)

Error:

ValueError: feature_names mismatch: ['Product Visitors', 'Product Pageviews', 'Rating']['f0', 'f1', 'f2']
expected Product Pageviews, Product Visitors, Ratings in input data
training data did not have the following fields f34, f5, f11, f7

Did you try to write the names directly, like ``X = df['Product Visitors', 'Product Pageviews', 'Rating']``?`It says that it can not find the feature_names. — JAdel, Jul 11 '22 at 12:57

score 0 · Answer 1 · answered Jul 11 '22 at 20:14

0

The score method of xgboost models (or any sklearn-compatible model) has signature (X_test, y_test); see the docs.

The error arises because xgboost is assuming that the y_test you pass in is the 2d data, and it thinks the indices of the series are the column names its expecting; the former are just numbers, hence f0, f23, f5, etc.

answered Jul 11 '22 at 20:14

Ben Reiniger

10,517
3
16
29

Thank you Ben that is helpful! Based on your expertise, how would I fix it? Pass in xgboostmodel.score(X_test, y_test) ? Is there a way to get the score of predicted data vs the labeled data? – Nekojell Jul 12 '22 at 01:34
@Nekojell `xgboostmodel.score(X_test, y_test)` will work, sure. You can compute the metrics directly from the predictions, e.g. with the functions in `sklearn.metrics`. – Ben Reiniger Jul 14 '22 at 03:10

Python XGBoost Regressor Error: Feature_names mismatch

1 Answers1