-1

I'm trying to use XGBoost Regressor to predict revenue, given some input features. However, I get a feature_names mismatch error when I run it. The features are all numerical features and there are no missing values.

cols_to_use = ['Product Visitors', 'Product Pageviews', 'Rating']
X = df[cols_to_use]
y = df['Revenue']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

xgboostmodel = XGBRegressor(n_estimators=1000, max_depth=7)
xgboostmodel.fit(X_train, y_train)
y_pred = xgboostmodel.predict(X_test)

xgboostmodel.score(y_test, y_pred)

Error:

ValueError: feature_names mismatch: ['Product Visitors', 'Product Pageviews', 'Rating']['f0', 'f1', 'f2']
expected Product Pageviews, Product Visitors, Ratings in input data
training data did not have the following fields f34, f5, f11, f7
Nekojell
  • 35
  • 4
  • Did you try to write the names directly, like ``X = df['Product Visitors', 'Product Pageviews', 'Rating']``?`It says that it can not find the feature_names. – JAdel Jul 11 '22 at 12:57
  • I just tried that but the same error appears – Nekojell Jul 11 '22 at 13:01

1 Answers1

0

The score method of xgboost models (or any sklearn-compatible model) has signature (X_test, y_test); see the docs.

The error arises because xgboost is assuming that the y_test you pass in is the 2d data, and it thinks the indices of the series are the column names its expecting; the former are just numbers, hence f0, f23, f5, etc.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • Thank you Ben that is helpful! Based on your expertise, how would I fix it? Pass in xgboostmodel.score(X_test, y_test) ? Is there a way to get the score of predicted data vs the labeled data? – Nekojell Jul 12 '22 at 01:34
  • @Nekojell `xgboostmodel.score(X_test, y_test)` will work, sure. You can compute the metrics directly from the predictions, e.g. with the functions in `sklearn.metrics`. – Ben Reiniger Jul 14 '22 at 03:10