I am trying to find the best parameters for a multioutput regressor problem using grid search. My code is below:
import xgboost as xgb
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier, MultiOutputRegressor
from sklearn.metrics import accuracy_score
parameters = {'nthread':[4], #when use hyperthread, xgboost may become slower
'objective':['reg:linear'],
'learning_rate': [.03, 0.05, .07], #so called `eta` value
'max_depth': [5, 6, 7],
'min_child_weight': [4],
'silent': [1],
'subsample': [0.7],
'colsample_bytree':[0.7],
'n_estimators': [500]}
model = xgb.XGBRegressor()
xgb_grid = GridSearchCV(model,
parameters,
cv = 2,
n_jobs = 5,
verbose=True)
multilabel_model = MultiOutputRegressor(xgb_grid)
multilabel_model.fit(X_train, y_train)
print(multilabel_model.best_score_)
print(multilabel_model.best_params_)
The model seems to fit correctly because it ran for some time after executing. However, when it finished, I then tried to run the best_score_ and best_params_ but both gave me errors. How would I go about finding the best parameters from the multioutput regressor? Thank you for the help in advance!
EDIT: I've tried doing the following as well:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)
But that led to the following error: AttributeError: 'GridSearchCV' object has no attribute 'best_score_'. AttributeError: 'GridSearchCV' object has no attribute 'best_params_'.
This led me to try the following:
parameters = {'nthread':[4], #when use hyperthread, xgboost may become slower
'objective':['reg:linear'],
'learning_rate': [0.03],#, 0.05, 0.07], #so called `eta` value
'max_depth': [1],#,5, 6, 7],
'min_child_weight': [4],
'silent': [1],
'subsample': [0.7],
'colsample_bytree':[0.7],
'n_estimators': [100]}
model = xgb.XGBRegressor()
multilabel_model = MultiOutputRegressor(model)
xgb_grid = GridSearchCV(multilabel_model,
parameters,
cv = 2,
n_jobs = 5,
verbose=True)
# fit the model
xgb_grid.fit(X_for_training, y_train)
But then I get this error: ValueError: Invalid parameter colsample_bytree for estimator MultiOutputRegressor. I'm not sure why this is since they are spelled the same in the parameters dictionary and the parameter input for the regressor and the value is a float. Am I defining my parameters value incorrectly somehow?