1

I'm trying to use optuna to tune hyperparameters of xgboost, but because of memory restriction, I can't set the attribute n_trials too high otherwise it would report MemoryError, so I'm wondering that if I set n_trials=5 and run the program for 4 times, would the result be similar to that I set n_trials=20 and run the program for one time?

2 Answers2

0

Yes, if you use the same database to store the study among different runs.

nzw0301
  • 359
  • 1
  • 9
  • Do you mean that the computer will tune the parameter according to former trials? It seems that every time I rerun the program, the results of former runs weren't stored in the computer. – Laughingtree Oct 05 '21 at 07:30
  • I think https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.load_study.html#optuna.study.load_study is a helpful example to save and load study. – nzw0301 Oct 05 '21 at 08:19
0
  • I changed my earlier answer to this:

It won't be similar, it's like playing a game 1/4 of the way and resetting to start again.

xgboost has a parameter xgb_model in its fit() method to use for incrementally training xgboost.

Basically, n_trials will remain as 20, for example. The dataset will be read instead in chunks.

The model must be saved after fitting the 1st chunk. The 2nd chunk will use this saved model, and if there are more chunks further, save the model again for the next chunk to use, and so on.

In addition, it would be good to check for memory leaks too, which can cause a problem. Also, ideally, n_estimators should not be too high, 1000 and below is fine. Too high and it will be slower and use more memory. Same with max_depth, I only use 6 to 13.

Here is a code snippet of the Optuna objective function showing this parameter xgb_model's usage.

    # for tuning incrementally in chunks
    def objective_chunk(self, trial, n_chunksize):
        nn_estimators = 500
        nn_early_stopping_rounds = nn_estimators * 0.1
        param = {
            # tree_method would ideally be gpu_hist for faster speed
            'tree_method':trial.suggest_categorical('tree_method', [tree_method]), 
            # L2 regularization weight, Increasing this value will make model more conservative
            'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
            # L1 regularization weight, Increasing this value will make model more conservative
            'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0),
            # Min loss reduction for further partition on a leaf node. larger,the more conservative
            'gamma':trial.suggest_categorical('gamma', [0,3,6]),
            # sampling according to each tree
            'colsample_bytree': trial.suggest_categorical('colsample_bytree',
                            [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]),
            # sampling ratio for training data
            'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]),
            'learning_rate': trial.suggest_categorical('learning_rate',
                            [0.008,0.009,0.01,0.012,0.014,0.016,0.018, 0.02,0.05]),
            'n_estimators': trial.suggest_categorical('n_estimators',[nn_estimators]),
            # maximum depth of the tree, signifies complexity of the tree
            'max_depth': trial.suggest_categorical('max_depth', [6,9,11,13]),
            'random_state': trial.suggest_categorical('random_state', [48]),
            # minimum child weight, larger the term more conservative the tree
            'min_child_weight': trial.suggest_int('min_child_weight', 1, 10)
        }
  
        model_xgbc = XGBClassifier(**param,use_label_encoder =False)  
        
        # Fit Model
        for i, X in enumerate(pd.read_csv(final_csv, chunksize =n_chunksize),start=1):
            y = self.X.pop('target')
            X_train, X_valid, y_train, y_valid = train_test_split(X, y,
                                                        train_size = 0.7, random_state=48)   
            X, y = None, None
            gc.collect()
    
            if i == 1:            
                print(f'Running Trial {trial.number} Chunk: {i}',end = ' | ')
                model_xgbc.fit(X_train, y_train, eval_set=[(X_valid, y_valid)],
                        verbose=False, eval_metric = ['logloss'],
                        early_stopping_rounds = nn_early_stopping_rounds)
            else:
                print(f'{i}',end = ' | ')
                model_xgbc = XGBClassifier(use_label_encoder =False)
                model_xgbc.load_model(f'{savepath}model_xgbc.json')
                model_xgbc.fit(X_train, y_train, eval_set=[(X_valid, y_valid)],
                        verbose=False, eval_metric = ['logloss'],
                        early_stopping_rounds = nn_early_stopping_rounds, 
                        xgb_model = f'{savepath}model_xgbc.json'
                        )

            '''Auxiliary attributes of the Python Booster object (such as feature_names) will 
            not be saved when using binary format. To save those attributes, use JSON instead.'''
            model_xgbc.save_model(f'{savepath}model_xgbc.json')

            preds = model_xgbc.predict(X_valid)
    
            rmse = metrics.mean_squared_error(y_valid, preds,squared=False)
            trial.report(rmse, i)
            
            if trial.should_prune():
                del param, model_xgbc, preds
                X_train, y_train = None, None
                X_valid, y_valid = None, None
                gc.collect()
                sleep(3)
                raise optuna.TrialPruned()
            else:
                del model_xgbc
                X_train, y_train = None, None
                X_valid, y_valid = None, None
                gc.collect()
                sleep(3)
                clear_gpu()
            
        del param, preds
        X_train, y_train = None, None
        X_valid, y_valid = None, None
        gc.collect()
        sleep(3)
        
        return rmse

This objective is called by code like below, just an example:

nn_trials = 20
nn_chunksize = 10000        # number of rows
study.optimize(lambda trial: otb.objective_chunk(trial, nn_chunksize), 
                        n_trials = nn_trials,
                        gc_after_trial = True)
J R
  • 436
  • 3
  • 7