0

I am trying to find the best parameters for a multioutput regressor problem using grid search. My code is below:

import xgboost as xgb
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier, MultiOutputRegressor
from sklearn.metrics import accuracy_score

parameters = {'nthread':[4], #when use hyperthread, xgboost may become slower
              'objective':['reg:linear'],
              'learning_rate': [.03, 0.05, .07], #so called `eta` value
              'max_depth': [5, 6, 7],
              'min_child_weight': [4],
              'silent': [1],
              'subsample': [0.7],
              'colsample_bytree':[0.7],
              'n_estimators': [500]}
model = xgb.XGBRegressor()
xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 2,
                        n_jobs = 5,
                        verbose=True)
multilabel_model = MultiOutputRegressor(xgb_grid)
multilabel_model.fit(X_train, y_train)
print(multilabel_model.best_score_)
print(multilabel_model.best_params_)

The model seems to fit correctly because it ran for some time after executing. However, when it finished, I then tried to run the best_score_ and best_params_ but both gave me errors. How would I go about finding the best parameters from the multioutput regressor? Thank you for the help in advance!

EDIT: I've tried doing the following as well:

print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

But that led to the following error: AttributeError: 'GridSearchCV' object has no attribute 'best_score_'. AttributeError: 'GridSearchCV' object has no attribute 'best_params_'.

This led me to try the following:

parameters = {'nthread':[4], #when use hyperthread, xgboost may become slower
              'objective':['reg:linear'],
              'learning_rate': [0.03],#, 0.05, 0.07], #so called `eta` value
              'max_depth': [1],#,5, 6, 7],
              'min_child_weight': [4],
              'silent': [1],
              'subsample': [0.7],
              'colsample_bytree':[0.7],
              'n_estimators': [100]}
model = xgb.XGBRegressor()
multilabel_model = MultiOutputRegressor(model)

xgb_grid = GridSearchCV(multilabel_model,
                        parameters,
                        cv = 2,
                        n_jobs = 5,
                        verbose=True)

# fit the model
xgb_grid.fit(X_for_training, y_train)

But then I get this error: ValueError: Invalid parameter colsample_bytree for estimator MultiOutputRegressor. I'm not sure why this is since they are spelled the same in the parameters dictionary and the parameter input for the regressor and the value is a float. Am I defining my parameters value incorrectly somehow?

cloud77
  • 45
  • 6

2 Answers2

1

The MultiOutputRegressor class doesn't have such attributes. Maybe you are looking for this:

    print(xgb_grid.best_score_)
    print(xgb_grid.best_params_)
S. Ali Mirferdos
  • 192
  • 1
  • 2
  • 12
  • I've tried doing that as well which led to the following errors: AttributeError: 'GridSearchCV' object has no attribute 'best_score_'. AttributeError: 'GridSearchCV' object has no attribute 'best_params_'. I've updated my post to now reflect this. – cloud77 Jul 28 '21 at 14:24
  • 1
    @cloud77 I guess you can find out a working example [here](https://www.kaggle.com/jayatou/xgbregressor-with-gridsearchcv) – S. Ali Mirferdos Jul 28 '21 at 14:31
  • This link: https://stackoverflow.com/questions/43532811/gridsearch-over-multioutputregressor fixed the problem. It was the way I was defining my parameters variable. Thank you! – cloud77 Jul 28 '21 at 14:44
0

It turns out I was defining my parameters dictionary wrong. The working code is below:


parameters = {'estimator__nthread':[4], #when use hyperthread, xgboost may become slower
              'estimator__objective':['reg:linear'],
              'estimator__learning_rate': [0.03, 0.05, 0.07], #so called `eta` value
              'estimator__max_depth': [5, 6, 7],
              'estimator__min_child_weight': [4],
              'estimator__silent': [1],
              'estimator__subsample': [0.7],
              'estimator__colsample_bytree':[0.7]}
              'estimator__n_estimators': [500]}
model = xgb.XGBRegressor()
multilabel_model = MultiOutputRegressor(model)

xgb_grid = GridSearchCV(multilabel_model,
                        parameters,
                        cv = 2,
                        n_jobs = 5,
                        verbose=True)

# fit the model
xgb_grid.fit(X_for_training, y_train)
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

best_model = xgb_grid.best_estimator_

The estimator__ before each parameter in the dictionary is what did the trick!

cloud77
  • 45
  • 6