0

I need to track progress of training model with xgboost with cross validation, depending on the amount of combinations cross-validation is considering. Is there anyway I can do this? I do not need how long it is going to take, just see progress to estimate how many iterations will it take and on which one currently is...

def train_model_xgboost(dataframe, variables, respuesta, mono_constraints):
    X_train, X_test, y_train, y_test = train_test_split( #probar time series train split
    dataframe[variables],
    dataframe[respuesta],
    random_state=2021
)
    param_grid = {'max_depth': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15],
              'subsample': [0.5, 1],
              'learning_rate': [0.001, 0.01, 0.1],
              'booster': ['gbtree'], # 'dart'
              #'sample_type': ['weighted'],
              #'normalize_type': ['forest'],
              #'skip_drop': [0.3],
              'monotone_constraints': [mono_constraints]
              #'tree_method': ['gpu_hist'],  # auto, hist, gpu_hist
              #'predictor': ['gpu_predictor']
              }
    np.random.seed(2021)
    idx_validacion = np.random.choice(
    X_train.shape[0],
    size=int(X_train.shape[0] * 0.1),
    replace=False
)
    X_val = X_train.iloc[idx_validacion, :].copy()
    y_val = y_train.iloc[idx_validacion].copy()

    X_train_grid = X_train.reset_index(drop=True).drop(idx_validacion, axis=0).copy()
    y_train_grid = y_train.reset_index(drop=True).drop(idx_validacion, axis=0).copy()

    # XGBoost necesita pasar los paramétros específicos del entrenamiento al llamar
    # al método .fit()
    fit_params = {"early_stopping_rounds": 5,
              "eval_metric": "rmse", #  rmse, mae, logloss, error, merror, mlogloss, auc
              "eval_set": [(X_val, y_val)],
              "verbose": 0
              }

# Cross Validation
    grid = GridSearchCV(
    estimator=XGBRegressor(
        n_estimators=1000,
        random_state=2021
    ),
    param_grid=param_grid,
    scoring='neg_root_mean_squared_error', #explained_variance  neg_root_mean_squared_error  neg_mean_absolute_error  neg_mean_squared_error  neg_mean_squared_log_error   neg_median_absolute_error   r2   neg_mean_poisson_deviance   neg_mean_gamma_deviance  neg_mean_absolute_percentage_error
    n_jobs=multiprocessing.cpu_count(),
    cv=RepeatedKFold(n_splits=5, n_repeats=2, random_state=2021),
    refit=True,
    verbose=0,
    return_train_score=True
)

    grid.fit(X=X_train_grid, y=y_train_grid, **fit_params)

What I need is to have an idea of how many iterations are left...

Rafa
  • 564
  • 4
  • 12
Elias Urra
  • 83
  • 1
  • 11
  • Doesn't [this](https://stackoverflow.com/questions/24121018/sklearn-gridsearch-how-to-print-out-progress-during-the-execution/61083259#61083259) do the trick? – Rafa Jun 17 '21 at 13:52

1 Answers1

2

You can change the verbosity of GridSearchCV using the verbose parameter:

  • 0 : no verbosity

  • >1 : the computation time for each fold and parameter candidate is displayed

  • >2 : the score is also displayed

  • >3 : the fold and candidate parameter indexes are also displayed together with the starting time of the computation.

If you are using Jupyter Notebook, the output will be displayed in the terminal window.


EDIT

If you want to have an estimation of the total duration, you can calculate the numbers of combination and then multiply it by the duration of an iteration and the number of cross validation split.

You can have the number of combination to be tested using ParameterGrid.

from sklearn.model_selection import ParameterGrid
param_grid = {'max_depth': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15],
              'subsample': [0.5, 1],
              'learning_rate': [0.001, 0.01, 0.1],
              'booster': ['gbtree'],
              'monotone_constraints': [mono_constraints]
}

pg = ParameterGrid(param_grid)
len(pg)

In your case 66 and then multiply it by t the duration of 1 iteration and 10 the number of cross validation splits (n_splits*n_repeats).

Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29
  • what I actually need is to have an idea of how long should this be taking... changing verbose does not tell me how many iterations are left... – Elias Urra Jun 16 '21 at 13:59
  • I updated my answer with a method to estimate the total duration of the `GridSearchCV`. – Antoine Dubuis Jun 16 '21 at 14:08
  • is there a way to know on what iteration I am... for example... 1 of 66, 2 of 66... etc ? I am not thaat interested in duration or time – Elias Urra Jun 16 '21 at 14:18
  • I do not think that there is one but I would suggest you to use `verbose=2` to have a log at the end of each fold computation. – Antoine Dubuis Jun 16 '21 at 14:34